Direkt zum Inhalt
AliBaba logo

    Ali Baba - PubMed as a graph

If you are looking for Ali Baba, please look here.




APIs



Services

Servlet

To check the pipeline servlet (output: AliBaba annotated XML) or the general availability of all services, use the following links:

Service ports

All services listen on various ports on Siegfried (bold ones need to be up & running -- check using the Availability-servlet)

8080           Apache
8080/textmining/ - this page

50800..50819 - NLP components: sentence splitting, tokenization, stemming, part-of-speech tagging
50803          Tokenizer; a monq DictFilter using tokens.mwt file
50804          Tokenizer, part II; a DictFilter using escapetokens.mwt file
50805          Tagger
50806          Search term highlighting

50820..50839 - Named Entity Recognition components
50821          proteins/genes: SwissProt/UniProt [<z:uniprot ids="">]
50822          species: NCBI taxonomy [<z:species ids="">]
50823          drugs: MeSH, tree D03..6 [<z:drug ids="">]
               or MedlinePlus (MedMaster, USDPI) (15334 terms) [<z:drug medmaster="" uspdi="">]
50824          Gene Ontology terms
50825          genes: HGNC
50826          tissues: MeSH tree A02..A10 (1360 terms) [<z:tissue ids="">]      Expasy tisslist has 1582 terms!
50827          anatomy: MeSH tree "A" -  body regions, tissues, cells, .. (1431 terms)
50828          diseases: MeSH tree "C" (12400 terms) [<z:disease ids="">]
50829          cells: MeSH tree number A11  [<z:cell ids="">]
50830          proteins/genes: UniProt/TrEmbl [<z:uniprot ids="">] - do not use together with 50821!
50831          nutrients:  [<z:nutrient ids="">]
50807          Abbreviation Resolver
50839          Word Sense Disambiguation

50840..50859 - Relation Mining components
50840          protein-protein co-occurrences [PPI] subtype=co-occurrence
50841          protein-protein interactions (patterns) [PPI] subtype=according to pattern
50842          cellular location of proteins (co-occurrence) [CLOP] subtype=co-occurrence
50843          species location of proteins (co-occurrence) [SLOP] subtype=co-occurrence
50844          protein-disease associations (co-occurrence) [PIA] subtype=co-occurrence
50845          protein-tissue associations (co-occurrence) [PTA] subtype=co-occurrence
50846          drug-disease associations (co-occurrence) [DIA] subtype=co-occurrence
50847          cellular location of proteins (patterns) [CLOP] subtype=according to pattern
50848          nutrient-disease associations (co-occurrence) [NIA] subtype=co-occurrence
50849          nutrient-tissue associations (co-occurrence) [NTA] subtype=co-occurrence
50850          nutrient-protein associations (co-occurrence) [NPA] subtype=co-occurrence
50851          protein-drug associations (co-occurrence) [PDA] subtype=co-occurrence

50860..50879 - Access to various data sources
50860          Medline in original XML format
50861          Medline reduced to text and basic data
50862          Small sample of MedLine, reduced format
50863          reduces Medline abstracts to title, text, pmid

Start filter services (DictFilter with MWT files)

There are scripts to start and stop services, all in the WEB-INF directory:

  • startAllServices - starts all services: NER, POS, relations
  • startAllCoOccurrenceFilters - starts all co-occurrence filters
  • killAllCoOccurrenceFilters - stops all running co-occurrence filters
    For killing processes, you need the proper system rights!
Alternatively, to start single services, use:
nohup java -classpath lib/monq.jar -Xmx1500m -XX:MinHeapFreeRatio=10 -XX:MaxHeapFreeRatio=10 monq.programs.DictFilter -t elem -e plain -p 50821 data/uniprot73.mwt &

nohup java -classpath lib/monq.jar:lib/tm.jar de.hu.wbi.textmining.ie.relations.CoOccurrenceFilter -p 50845 -1 z:uniprot -2 z:tissue -c SENT -type PTA &

Modes

AliBaba supports different annotation orders and types. Orders refer to oder of pattern and/or co-occurrence search. Types refer to sets of different entities and their associations that get annotated. The pipeline servlet expects a parameter 'mode' - simply add order and type together.

order=0:       only co-occurrences, no pattern matching
order=1:       only pattern matching, no co-occurrences
order=2:       first co-occurrences, than patterns (patterns overwrite co-occ)
type=0:        C,D,I,P,S,T; PPI,CLOP,SLOP,PTA,PDA,DIA
type=40:       I,N,P,T; PPI,NIA,NTA,NPA
--------
mode=2         uses ordering '2' and type '0': 2+0=2
mode=42        uses ordering '2' and type '40': 40+2=42
--------
mode=101:      only named entity recognition, no POS-tagging, no relations
mode=201:      only protein NER, POS, pattern-based PPI
mode=202:      only protein NER, POS, coocc then pattern-based PPI

Note: If you use the Pipeline servlet programmatically, please always use mode=01 or mode=02 (when necessary); this excludes the request from the log statistics.


Directories

HTML:      /local/tomcat/webapps/ROOT/
Servlets:  /local/tomcat/webapps/ROOT/WEB-INF/classes/de/hu/wbi/textmining/visualization/servlets/
Libraries (tm,monq): /local/tomcat/webapps/ROOT/WEB-INF/lib/
Data (patterns, mwt-files): /local/tomcat/webapps/ROOT/WEB-INF/data/
Software (tagger): /local/software/
Last changes: Mar 27 2006 JH