Documentation

We have often heard that data is the new oil. In particular, extracting information from semi-structured textual documents on the Web is key to realize the Linked Data vision. Several attempts have been proposed to extract knowledge from textual documents, extracting named entities, classifying them according to pre-defined taxonomies and disambiguating them through URIs identifying real world entities. As a step towards interconnecting the Web of documents via those entities, different extractors have been proposed. Although they share the same main purpose (extracting named entity), they differ from numerous aspects such as their underlying dictionary or ability to disambiguate entities. NERD proposes a web framework which unifies numerous named entity extractors using the NERD ontology which provides a rich set of axioms aligning the taxonomies of these tools.

Table of contents

  1. Extractors supported
  2. NERD API documentation
  3. NERD API libraries

NERD API documentation

POST http://nerd.eurecom.fr/api/document

Request parameters

keyrequired

A UTF8, the NERD APIkey.

textoptional*

A UTF8, the text file which will be processed to extract entities. Although the field is optional, it is required if {timedtext,uri} are not declared.

timedtextoptional*

A UTF8, the SRT file which will be processed to extract entities. Although the field is optional, it is required if {text,uri} are not declared.

urioptional*

A UTF8, the URI of the article. Although the field is optional, it is required if {timedtext,text} are not declared.

Response parameters

idDocument

A UTF8, the document identifier.

Example

POST

curl -i -X POST http://nerd.eurecom.fr/api/document -d "uri=http://www.bbc.co.uk/news/world-us-canada-19644448&key=YOUR_API_KEY"


                 { 
                    "idDocument":164
                 }
                 

POST http://nerd.eurecom.fr/api/annotation

Request parameters

keyrequired

A UTF8, the NERD APIkey.

idDocumentrequired

A UTF8, the document identifier.

extractorrequired

A UTF8, the name an extractor. The accepted values are: {combined, alchemyapi, datatxt, dbspotlight, lupedia, opencalais, saplo, semitags, textrazor, thd, wikimeta, yahoo, zemanta}.

ontologyoptional*

A UTF8. The accepted values are: core, extended. The default value is core.

timeoutoptional*

A UTF8, the maximum interval in seconds to perform the annotation.

Response parameters

idAnnotation

A UTF8, the id of the document.

Example

POST

curl -i -X POST http://nerd.eurecom.fr/api/annotation -d "key=YOUR_API_KEY&idDocument=164&extractor=alchemyapi&ontology=core&timeout=10"


                 {
                    "idAnnotation":427
                 }
                 

GET http://nerd.eurecom.fr/api/entity

Request parameters

keyrequired

A UTF8, the NERD APIkey.

idAnnotationrequired

A UTF8, the annotation identifier.

granularityoptional

A UTF8. Accepted values: oen | oed. The oen (One Entity per Name) reads all the entities found in the document. The oed (One Entity per Document) removes duplicates (a duplicate happens when two or more entities have the same NE,type and URI) and reads only one occurrence.

Response parameters

Array(entity)

An array of entity object. The extractor field assumes the following values: alchemyapi,datatxt,dbspotlight,opencalais,lupedia,saplo, semitags,wikimeta,yahoo,zemanta (names of the services supported) or combined. For futher details, see the example below.

Example

GET

curl -i -X GET -H "Accept: application/json" "http://nerd.eurecom.fr/api/entity?key=YOUR_API_KEY&idAnnotation=427"


                 [
                  {
                    idEntity: 120,
                    label: "BBC",
                    startChar: 138,
                    endChar: 141,
                    extractorType: "Company",
                    nerdType: "http://nerd.eurecom.fr/ontology#Organization",
                    uri: "http://dbpedia.org/resource/BBC",
                    confidence: 0.0582796,
                    relevance: 0.5,
                    extractor: "dbspotlight"
                    },
                   ...
                  ]
                 

NERD API libraries

  • nerd4java - Java client
  • nerd4python - Python client
  • nerd4node - Nodejs client
  • nerdier - Ruby client