A directory of information services and datasets in agriculture

agINFRA REST API

Responsible body: 
AgINFRA Consortium
Institute of Physics Belgrade
Belongs to network: 
AgINFRA

agINFRA RESTful interface is developed within the agINFRA project as a main effort of WP3, and is used for cataloging, off-line processing, and management of data. As a catalog facility, agINFRA RESTful interface keeps user's configurations of Grid-ported applications, datasets’ metadata information and locations, arbitrary user-defined metadata information, and allows querying of this information. Based on the information stored in the catalog, the gateway performs off-line Grid job submissions in order to collect new datasets, transform existing datasets, and produce new datasets' metadata information relevant for end-users.

Due to heterogeneity of the project infrastructure and ported applications, registered locations of datasets initially point to the different storage architectures or systems, and retrieval of datasets is limited to the storage-specific protocols and authentications mechanisms. Unification of these protocols is achieved by the agINFRA RESTful interface, and end-users are enabled to retrieve the datasets produced within the infrastructure using a unique protocol (HTTP protocol). This interface carries out automatic replication of datasets, ensures the existence of the same dataset on different storage systems, and its exposure through the HTTP protocol.

This API supports a number of data processing services. If the service requires the registration of a target or a job, the same method (/) is used with a POST request, specifying the type of target and / or the type of job (see method and parameters below). For all services, the monitoring of the registered job is done using the same method (/) with a GET request, specifying the job ID. Other methods listed below are specific to some of the services.

Services

agHarvester
agDataHarvester performs harvesting of any dataset exposed via an OAI-PMH target. The module is agnostic about the type of metadata to be harvested (DC, LOM) and can support harvesting of any metadata format as this is declared in the “metadaPrefix” field of the verb “ListMetadataFormats” of an OAI-PMH target (e.g http://aglr.agroknow.gr/organic-edunet/oai?verb=ListMetadataFormats).

agCrawler
agCrawler is a customized version of Apache Nutch, an highly extensible and scalable open source Web crawler. Its main goal is to discover resources on the Web (i.e. URLs), starting from some Web sites defined by the user.

agDCtoLOM
agDCtoLOM process performs conversion of Dublin Core (DC) metadata schema into in LOM metadata schema. This process is part of the data transformation layer. The transformation could be executed taking as input a technical binding (e.g. XSLT) of the corresponding mappings.

agLOMtoAKIF
agLOMtoAKIF performs conversion of a set of metadata records with XML binding that follow IEEE LOM metadata format into AKIF format.

agLOMtoRDF
agLOMtoRDF performs conversion of a set of metadata records with XML binding that follow IEEE LOM metadata format into RDF/XML binding.

agTextMining
agTextMining returns for a given datasets titles, authors, references and keywords. Currently, version 1.0 works with IEEE LOM records serialized as XML files. The keyword extractor uses KEA algorithm and statistical model to calculate keywords from the text. The title and author are parsed detecting sudden font size changes. Finally, the references are obtained parsing numbers between brackets.

agTagger
agTagger is a keyword extractor that uses the AGROVOC thesaurus to extract keywords from the content of some URLs. Since AGROVOC is published as Linked Open Data, the agTagger can do more than extracting keywords, it can extract AGROVOC URIs. The agTagger is based on MAUI, a piece of software that automatically identifies main topics in text documents, using two different algorithms: the key-phrase extraction algorithm KEA, and the machine learning toolkit WEKA. To be used in the AgroTagger, MAUI was trained to work with AGROVOC (in English).

Type of web service: 
REST web service
Methods
Method path: 
/
HTTP method: 
Description: 
The root method used with a POST request is used for registering new resources into the database. If any of the services requires the registration of a target or a job, this method is used with a POST request, specifying the type of target and / or the type of job (see parameters below). Types of resources that can be registered: - harvesting targets (harvesting_target); - jobs (job); - crawler targets (crawler_target). The method can be used to support different types of jobs: - agHarvester job; - agCrawler job; - agDCtoLOM job; - agLOMtoAKIF job; - agLOMtoRDF job; - agTextMining job; - agTagger job.
Method path: 
_design/aginfra/_view/harvesting_target
HTTP method: 
Description: 
This method can be used for the agHarvest service. The harvesting service implements the OAI-PMH protocol to harvest metadata records from content providers. This method returns the list of registerd harvesting targets.
Method path: 
_design/aginfra/_view/crawler_target
HTTP method: 
Description: 
This method can be used for the agCrawler service. This method returns the list of all registered agCrawler targets.
Location country: 
agINFRA The RING is part of the agINFRA project EC 7th framework program INFRA-2011-1.2.2 - Grant agr. no: 283770