Text mining web services

BioTec TU Dresden and Humboldt-Universität zu Berlin are participating in the BioCreative MetaServer project. BCMS is a joint effort of currently 13 groups to provide web services for annotations to biomedical texts. The platform unites multiple different systems to annotate gene/protein identifiers from EntrezGene, UniProt, and other sequence databases; gene and protein mentions in text (without IDs); species occurring in texts (mapped to NCBI taxonomy IDs); and predictions of whether or not an article discusses protein--protein interactions.

Please refer to reference [1] for a more detailed introduction to the BCMS project.

Access to our services

The services in the BCSM project use the XML-RPC scheme for requests and responses. Please find below the access URLs for requesting GN, GM, TX, and PI annotations. We provide Java clients (source and binary) that query our servers and which can easily be modified to suit your specific needs.

GN annotations:
Hosted by Biotec TU Dresden at http://gopubmed2.biotec.tu-dresden.de/XmlRpcServlet
GM, TX, PI annotations:
Hosted by Humboldt-Universität zu Berlin at http://141.20.27.241:81/XmlRpcServlet
Method name:
The method to invoke is called Annotator.getAnnotation and expects a single PubMed ID as parameter.
Java client:
Package with sources and binaries: client.tar.gz;
You will also need the libraries xmlrpc-common-3.0.jar,xmlrpc-client-3.0.jar, and ws-commons-util-1.0.1.jar (or newer versions), which you can download from the Apache mirrors, see http://www.apache.org/dyn/closer.cgi/ws/xmlrpc/. Simply get the file called xmlrpc-current-bin.tar.gz and unpack it. The libraries are contained in the lib/ folder.
Please also read this short summary
Output:
The returned values are tuples that describe each annotation. For gene mention normalization and protein mentions, the tuple will consist of four elements: the referenced database (dbname, either EntrezGene or UniProt), the genes/proteins ID in that database (dbid), the species for this gene/protein (taxid, from NCBI taxonomy), and a confidence telling how reliable this annotations is (confidence, between 0 and 1). For species, the tuple contains the NCBI Taxonomy ID (taxid) and a confidence value. For protein-protein interactions, the tuple states whether an interaction was predicted (true) and with which confidence. Note that for articles not predicted to contain an interaction, no tuple is given (and not an interaction with the value 'false').
The output of the client on the command line will be a list of annotations, for example:
taxons  confidence      1.0
taxons  taxid   2759
normalizations  confidence      1.0
normalizations  dbname  UniProt
normalizations  dbid    O12705
normalizations  taxid   51677
normalizations  confidence      0.25
normalizations  dbname  EntrezGene
normalizations  dbid    9054
normalizations  taxid   9606
interaction     true
interaction     0.8

References


Last changes: JH, 10/03/2007.