The BioConnect Platform
What is it?
BioConnect is a platform that supports an ecosystem of computational resources for biological research. In addition to connecting resource-to-resource within the ecosystem, BioConnect integrates with other platforms and ecosystems such as AnVIL, Terra, EBI databases, and others. This enables inter-ecosystem communication, sharing, search, and analysis in a "biome" of data science.
BioConnect is part digital library, part software hosting service, part data acquisition process, and part study curation and quality assurance system. While it does provide the means to host data, researchers can also.
In its capacity as a digital library of research data, tools and results (collectively we call these resources), BioConnect indexes resources both inside and outside the ecosystem. Indexing here means assigning a standard description of the resource such that all similar resources will have similar description semantics. In the ideal curation process, if the same resource were to be indexed twice, both descriptions would be the same.
What is it not?
While the core system aims to support biological research, it is not itself a research tool. Think of the system as more of a hub that sits in the interstitial space between computational resources that support indexing and integrating data, analyses, and results.
BioConnect is also not the definitive source of data models or ontologies. The system relies on the ISA common data model for describing investigations, studies, and assays. However, the extensible ISA model does not specify all attributes needed to define a given data set or tool. Data models for each of the myriad data objects (called types
) are defined by bio-curators according to the standards of practice in the domain.
It is also not a repository of all biological knowledge. While the core system does provide functions for indexing raw data, derived inferences, and resulting causal models, it does this to provide a default storage location not a definitive location. Specialist repositories may still be the best place to store such data; in which case BioConnect can also link to the remote repository.
How does it work?
Describe the flow of data from acquisition, through curation, QC, search, selection, search for tool, visualize data, store artifact.
Background in the data governance landscape
The problems we are addressing with BioConnect are not new. Research teams around the research world have been working on some aspect of this problem for many years. This project prioritizes a mix of capabilities that are relevant to The Jackson Laboratory. Other projects have focused on different or more constrained challenges. Regardless of differences scope, there are a lot of excellent projects in the data governance and data science domain that we can learn from or incorporate. These are explored in the Background section.
Measuring success
How do we know if BioConnect is doing the right job, and doing it well?