|
|
|
Brian Gilman |
|
Whitehead Institute |
|
|
|
|
Users access data through a common look and feel
(Ensembl look and feel) |
|
Never write a fasta parser ever again |
|
Biological data is transferred over the web via
a standard set of protocols (DAS/SOAP) |
|
Data is aggregated via a common middleware and
relationships amongst data are auto-discovered (Semantic Web) |
|
|
|
|
|
|
Provide an infrastructure which is portable,
extensible and robust for this domain |
|
Use portable solutions from other domains to
solve problems in BioInformatics |
|
HTTP |
|
XML |
|
SOAP |
|
UDDI |
|
DAS |
|
|
|
|
|
|
|
|
Data Stores include: |
|
Genomics |
|
Ensembl |
|
NCBI |
|
Proteomics |
|
Swissprot |
|
Publications |
|
Pubmed |
|
|
|
|
Provide middleware to perform web service
integration and translation |
|
Data accessed through common protocol stack
(“Soapy DAS”) |
|
Data is referenced by translation to common
naming scheme (Ontology) |
|
|
|
|
|
|
|
A database schema can be represented as XML
Schema |
|
XML is a W3C standard |
|
XML is supported industry wide |
|
EJB is scalable and robust |
|
<? xml version=“1.0” ?> |
|
<schema source=“WIBR”> |
|
<table name=“SNP” > |
|
<field name=“allele” type=“VARCHAR” /> |
|
<field name=“left_flank” type=“TEXT”/> |
|
</table> |
|
</schema> |
|
|
|
|
XML-->Object Paradigm is well understood |
|
XML is easily translated into other object
models |
|
|
|
Import java_classes.*; |
|
Import ejb_classes.*; |
|
public class SNP_WIBR extends EJB_Classes
implements Table{ |
|
private String left_flank, allele; |
|
public SNP_WIBR(){} |
|
public getLeft_flank(){} |
|
public setLeft_flank(){} |
|
public getAllele(){} |
|
public setAllele(){} |
|
} |
|
|
|
|
|
|
|
A protocol which utilizes HTTP and XML to query
genomic data |
|
Genomic features |
|
Sequence Data |
|
Proteomic Data (WIBR Initiative) |
|
Publication Data (WIBR Initiative) |
|
Workflow Data (WIBR Initiative) |
|
|
|
|
|
|
|
|
|
|
No dependency on particular database schemas or
technologies |
|
No dependency on particular client-side
technologies |
|
Uncoupled reference and annotation servers |
|
Must handle instability in genome assemblies |
|
Must be dirt simple to implement |
|
|
|
|
Anything that has genomic coordinates |
|
|
|
|
|
|
|
|
|
|
|
|
|
Client/Server model |
|
Communications via XML |
|
Servers run on top of conventional web servers |
|
Clients use Open Source XML parsers |
|
Servers: >100 lines of code |
|
Clients:
>1000 lines code |
|
|
|
|
|
|
|
Semi-controlled feature vocabulary |
|
Category |
|
Transcription, translation, structural,
experimental |
|
Type |
|
intron, exon, CDS, 5’UTR, SNP, similarity,
oligo, insertion, RNAi |
|
User can filter by category and/or type |
|
Data sources can add new types at will |
|
|
|
|
|
Annotate to smallest stable sequence element |
|
finished clone |
|
phase II fragment |
|
Version everything |
|
Annotations, contigs, assemblies |
|
|
|
|
|
|
Libraries |
|
Bio::DAS (Perl) |
|
Dazzle (Java) |
|
DASQuery (WICGR API) |
|
Servers & Databases |
|
Acedb, Dazzle-on-Ensembl, Gadfly, Bio::DB::GFF |
|
OmniGene |
|
Clients |
|
Java-Client, Geodesic (Java), DasView (Perl),
Ensembl Contigview (Perl), OmniView (Java) |
|
|
|
|
|
Reference servers |
|
WormBase (C. elegans) |
|
FlyBase (Drosophila) |
|
Ensembl (Human) |
|
HGxxx (UCSC) |
|
Annotation servers |
|
WormBase (C. elegans) |
|
WashU (elegans) |
|
Ensembl (Human) |
|
FlyBase (Drosophila) |
|
TIGR (Human, elegans) |
|
MRC (Human, elegans) |
|
LBL (Human) |
|
|
|
|
Difficult to represent nested subfeatures |
|
Can’t annotate non-genomic references |
|
Too narrowly focussed on genomic data |
|
Read only protocol |
|
|
|
|
|
Software & Specifications |
|
http://www.biodas.org |
|
http://www.biojava.org |
|
http://www.bioxml.org |
|
http://www.sourceforge.net/projects/omnigene |
|
|
|
|
www.sourceforge.net/projects/omnigene |
|
devo.wi.mit.edu/~gilmanb/omnigene |
|
www.uddi.org |
|
www.w3c.org |
|
xml.apache.org |
|
www.biodas.org |
|
|
|
|
|
|