BioXMash: XML data summaries for...
BioXMash: XML data summaries for user-driven data integration.
Long-term goal: data integration for end-users.
XML-based. Selection/projection on trees. Interactive querying. Indexing pseudo-off-line (e.g. index while at lunch, then have interactive session). Working with multi-gig xml files. Can work with thousands of files (presumably they are small files).
Fun statements like "we know that in genecards, each file is a gene" that say a lot about how they are thinking about data-integration. From indexes, find possibly relevant xml files/fragments, slurp into memory, blat it out (with select/project) as the query answer.
No semantic integration. Very focussed upon structural-metrics for source-document extraction. Could be helped by full-text indexing. These guys had to go ask for an xml dump of databases. Could do a RESTFUL xml export of source data, but this is firmly future work.
Quite fun, but seems a bit 'good idea' and not so 'just works'. In pilot phaze it seems.
VisGenome.
Comments