Data Linkage Graph: computation...
Data Linkage Graph: computation, querying and knowledfe discovery of life science data.
Databases in sciences are structured. Databases are organised by domains. --note: of course, this is wrong on both counts - they are badly structured and have very overlapping, messy domains, although a single protein may have a 'core' concept of interest that doesn't overlap others much.
Life-science databases are already heavily linked via xrefs - 87% of lrelevant databases have some sort of interlinks.
Database links are explicit - html, dbms integral reference integrity, web-service parameters; implicit - text mining, sequence alignment, pattern matching, ... --note: I'd call these schematic and data-hapenstance but who's counting?
Data warehouses - model unified schema, import (crappy - format handling etc.), make complex queries 9SQL, XQuery, ...), expose through canned queries on user interface
Virtual integrated db - the first 2 steps become lazy and distributed. Requires a lot of knowledge.
Query by navigation: data linkage graph - building a semantic network. E.g. make all xrefs in html forms links. Make all seqquence renderings have a link to allignments. This, of course, is only useful to people.
They make a distinction between inter-entity relationships (links) and inner-entity relationships (functions). The resulting schema is awesom - 318 nodes, 375 relations!!! And those are the ones they thought where interesting from just 5 data sources!!!
This is all about the data. It fails to at all address the meaning of the data.
Having said that, it looks like it's been done right. I'd almost go as far as to say Ondex done right.