Ali Baba - PubMed as a graph
If you are looking for Ali Baba, please look here.
APIs
Services
Servlet
To check the pipeline servlet (output: AliBaba annotated XML) or the general availability of all services, use the following links:
- http://alibaba.informatik.hu-berlin.de/servlet/Pipeline?mode=002&pubmedquery=16891413
- http://alibaba.informatik.hu-berlin.de/servlet/Pipeline?mode=002&pubmedquery=11259443
- http://alibaba.informatik.hu-berlin.de/servlet/Pipeline?mode=002&pubmedquery=17291719 17387141 11259443 17387690 17387717 17387621 17157541
- http://alibaba.informatik.hu-berlin.de/servlet/Availability
Service ports
All services listen on various ports on Siegfried (bold ones need to be up & running -- check using the Availability-servlet)
8080 Apache
8080/textmining/ - this page
50800..50819 - NLP components: sentence splitting, tokenization, stemming, part-of-speech tagging
50803 Tokenizer; a monq DictFilter using tokens.mwt file
50804 Tokenizer, part II; a DictFilter using escapetokens.mwt file
50805 Tagger
50806 Search term highlighting
50820..50839 - Named Entity Recognition components
50821 proteins/genes: SwissProt/UniProt [<z:uniprot ids="">]
50822 species: NCBI taxonomy [<z:species ids="">]
50823 drugs: MeSH, tree D03..6 [<z:drug ids="">]
or MedlinePlus (MedMaster, USDPI) (15334 terms) [<z:drug medmaster="" uspdi="">]
50824 Gene Ontology terms
50825 genes: HGNC
50826 tissues: MeSH tree A02..A10 (1360 terms) [<z:tissue ids="">] Expasy tisslist has 1582 terms!
50827 anatomy: MeSH tree "A" - body regions, tissues, cells, .. (1431 terms)
50828 diseases: MeSH tree "C" (12400 terms) [<z:disease ids="">]
50829 cells: MeSH tree number A11 [<z:cell ids="">]
50830 proteins/genes: UniProt/TrEmbl [<z:uniprot ids="">] - do not use together with 50821!
50831 nutrients: [<z:nutrient ids="">]
50807 Abbreviation Resolver
50839 Word Sense Disambiguation
50840..50859 - Relation Mining components
50840 protein-protein co-occurrences [PPI] subtype=co-occurrence
50841 protein-protein interactions (patterns) [PPI] subtype=according to pattern
50842 cellular location of proteins (co-occurrence) [CLOP] subtype=co-occurrence
50843 species location of proteins (co-occurrence) [SLOP] subtype=co-occurrence
50844 protein-disease associations (co-occurrence) [PIA] subtype=co-occurrence
50845 protein-tissue associations (co-occurrence) [PTA] subtype=co-occurrence
50846 drug-disease associations (co-occurrence) [DIA] subtype=co-occurrence
50847 cellular location of proteins (patterns) [CLOP] subtype=according to pattern
50848 nutrient-disease associations (co-occurrence) [NIA] subtype=co-occurrence
50849 nutrient-tissue associations (co-occurrence) [NTA] subtype=co-occurrence
50850 nutrient-protein associations (co-occurrence) [NPA] subtype=co-occurrence
50851 protein-drug associations (co-occurrence) [PDA] subtype=co-occurrence
50860..50879 - Access to various data sources
50860 Medline in original XML format
50861 Medline reduced to text and basic data
50862 Small sample of MedLine, reduced format
50863 reduces Medline abstracts to title, text, pmid
Start filter services (DictFilter with MWT files)
There are scripts to start and stop services, all in the WEB-INF directory:
- startAllServices - starts all services: NER, POS, relations
- startAllCoOccurrenceFilters - starts all co-occurrence filters
- killAllCoOccurrenceFilters - stops all running co-occurrence filters
For killing processes, you need the proper system rights!
nohup java -classpath lib/monq.jar -Xmx1500m -XX:MinHeapFreeRatio=10 -XX:MaxHeapFreeRatio=10 monq.programs.DictFilter -t elem -e plain -p 50821 data/uniprot73.mwt & nohup java -classpath lib/monq.jar:lib/tm.jar de.hu.wbi.textmining.ie.relations.CoOccurrenceFilter -p 50845 -1 z:uniprot -2 z:tissue -c SENT -type PTA &
Modes
AliBaba supports different annotation orders and types. Orders refer to oder of pattern and/or co-occurrence search. Types refer to sets of different entities and their associations that get annotated. The pipeline servlet expects a parameter 'mode' - simply add order and type together.
order=0: only co-occurrences, no pattern matching order=1: only pattern matching, no co-occurrences order=2: first co-occurrences, than patterns (patterns overwrite co-occ) type=0: C,D,I,P,S,T; PPI,CLOP,SLOP,PTA,PDA,DIA type=40: I,N,P,T; PPI,NIA,NTA,NPA -------- mode=2 uses ordering '2' and type '0': 2+0=2 mode=42 uses ordering '2' and type '40': 40+2=42 -------- mode=101: only named entity recognition, no POS-tagging, no relations mode=201: only protein NER, POS, pattern-based PPI mode=202: only protein NER, POS, coocc then pattern-based PPI
Note: If you use the Pipeline servlet programmatically, please always use mode=01 or mode=02 (when necessary); this excludes the request from the log statistics.
Directories
HTML: /local/tomcat/webapps/ROOT/ Servlets: /local/tomcat/webapps/ROOT/WEB-INF/classes/de/hu/wbi/textmining/visualization/servlets/ Libraries (tm,monq): /local/tomcat/webapps/ROOT/WEB-INF/lib/ Data (patterns, mwt-files): /local/tomcat/webapps/ROOT/WEB-INF/data/ Software (tagger): /local/software/