agINFRA RESTful interface is developed within the agINFRA project as a main effort of WP3, and is used for cataloging, off-line processing, and management of data. As a catalog facility, agINFRA RESTful interface keeps user's configurations of Grid-ported applications, datasets’ metadata information and locations, arbitrary user-defined metadata information, and allows querying of this information. Based on the information stored in the catalog, the gateway performs off-line Grid job submissions in order to collect new datasets, transform existing datasets, and produce new datasets' metadata information relevant for end-users.
Due to heterogeneity of the project infrastructure and ported applications, registered locations of datasets initially point to the different storage architectures or systems, and retrieval of datasets is limited to the storage-specific protocols and authentications mechanisms. Unification of these protocols is achieved by the agINFRA RESTful interface, and end-users are enabled to retrieve the datasets produced within the infrastructure using a unique protocol (HTTP protocol). This interface carries out automatic replication of datasets, ensures the existence of the same dataset on different storage systems, and its exposure through the HTTP protocol.
This API supports a number of data processing services. If the service requires the registration of a target or a job, the same method (/) is used with a POST request, specifying the type of target and / or the type of job (see method and parameters below). For all services, the monitoring of the registered job is done using the same method (/) with a GET request, specifying the job ID. Other methods listed below are specific to some of the services.
agDataHarvester performs harvesting of any dataset exposed via an OAI-PMH target. The module is agnostic about the type of metadata to be harvested (DC, LOM) and can support harvesting of any metadata format as this is declared in the “metadaPrefix” field of the verb “ListMetadataFormats” of an OAI-PMH target (e.g http://aglr.agroknow.gr/organic-edunet/oai?verb=ListMetadataFormats).
agCrawler is a customized version of Apache Nutch, an highly extensible and scalable open source Web crawler. Its main goal is to discover resources on the Web (i.e. URLs), starting from some Web sites defined by the user.
agDCtoLOM process performs conversion of Dublin Core (DC) metadata schema into in LOM metadata schema. This process is part of the data transformation layer. The transformation could be executed taking as input a technical binding (e.g. XSLT) of the corresponding mappings.
agLOMtoAKIF performs conversion of a set of metadata records with XML binding that follow IEEE LOM metadata format into AKIF format.
agLOMtoRDF performs conversion of a set of metadata records with XML binding that follow IEEE LOM metadata format into RDF/XML binding.
agTextMining returns for a given datasets titles, authors, references and keywords. Currently, version 1.0 works with IEEE LOM records serialized as XML files. The keyword extractor uses KEA algorithm and statistical model to calculate keywords from the text. The title and author are parsed detecting sudden font size changes. Finally, the references are obtained parsing numbers between brackets.
agTagger is a keyword extractor that uses the AGROVOC thesaurus to extract keywords from the content of some URLs. Since AGROVOC is published as Linked Open Data, the agTagger can do more than extracting keywords, it can extract AGROVOC URIs. The agTagger is based on MAUI, a piece of software that automatically identifies main topics in text documents, using two different algorithms: the key-phrase extraction algorithm KEA, and the machine learning toolkit WEKA. To be used in the AgroTagger, MAUI was trained to work with AGROVOC (in English).