About BioConnect

Vision

provide a digital index, like a library catalog, describing data sets, tools, and other research-relevant resources at JAX
provide a default process for storage, curation, and QC of assay data from core scientific services
provide a default environment for indexing or hosting scientific resources, such as biological databases or analysis services, to handle common technical needs
cut out hassles in common steps on the path toward computational results, e.g., finding key information describing an experiment or dataset, bringing together data with tools for analysis
connect with external systems like AnVIL, terra.bio, Galaxy, Gen3, and the GA4GH network to align with NIH's data science strategy
develop a great user experience in all modules: allow researchers to focus on research with no distractions, provide access to powerful tools for data visualization

A General Data Repository for The Jackson Laboratory

BioConnect provides a foundation of organizational and technical services:

register digital resources for indexing, and host them here if you like
annotation of data, analysis tools, models, and anything else related to biology research according to a well-defined, common data model
support reproducibility and traceability with versioned annotations and automated processing documentation; all original work is attributed to its authors
interoperability is designed into the system from the foundation with a data model that harmonizes semantic heterogeneity
technical services to scientific research databases and related web applications include, for example, security, logging, messaging, search, hosting environment

BioConnect supports reproducible research and FAIR¹ data principles by:

connecting assay results to well-curated metadata
connecting data and metadata to analysis results with process step descriptions
connecting analysis results to publications with digital object identifiers (DOI)

Who is BioConnect for?

BioConnect, and this documentation, serves both exploration of research resources² (data, methods, models) and integration of these resources for computational research.

Exploration

Research labs can use BioConnect to search for data, supplement research with data from other mouse cohorts, and use plugged-in tools and databases in analytical pipelines. The plugin system allows researchers to share resources with lower development effort and standardized security. BioConnect also supports alignment of these resources with NIH data management and sharing policies.

Check out the search and browse overviews from the Quick Start page for more information on the current version of the BioConnect application.

Integration

Biological analysis comes in many forms and depends on a researcher's ability to integrate resources by transforming formats and translating semantics.

This can be an arduous process when a dataset comes without description or an analysis method comes without documentation. The better the available description and documentation, the easier it is to focus on scientific discovery. First we take steps to prepare a research resource to make it FAIR (findable, accessible, interoperable, and reusable)¹. Then we can confidently integrate hypotheses, data, and models along the path from FAIR data to experimental results.

The second is the engineering connections to be made that enable technical interoperability. The aim is to set up technical resources to be self-documenting, and interfaces between them take advantage of this to support reproducibility.

Making data FAIR

To make data FAIR, in all aspects of the acronym, there must be a robust supporting process for: 1) curation of data, 2) validation of code in analysis tools, and 3) community organization³. If the goal of an organization is to treat its data as a valuable asset, then it must put in the requisite effort in each of these areas to realize the full value of the data.

The BioConnect ecosystem supports the three FAIR-ification processes with the technical tools needed by curators and software engineers. The plugin system allows researchers to share resources as modules in BioConnect. This lowers the "activation energy" required for creating a new web application by providing a hosting environment, a standard process for deploying client-side and server-side code, hosted documentation, and fully audited security. BioConnect also supports alignment of these resources with NIH data management and sharing policies⁴. This also provides a common platform as a point of integration with external community resources.

Making use of FAIR data

To ensure reproducibility of computational analyses, immutable versions of data sets must be accessible, and the steps to process these data are likewise encoded and stored in immutable versions. For example, Nextflow or WDL workflow descriptions can be stored as code with all changes recorded for a set of analysis steps. Fixed versions of data sets can be much trickier to maintain. Maintaining an unbroken chain of data history from subject and sample origin all the way through the publication of a figure seems to be rare in the wild of biological research; databases like SRA, dbGaP, GTEx, GEO, MPD and GeneNetwork are exceptions to this rule.

See the Research Use Cases section for specific examples.

See "The FAIR Guiding Principles for scientific data management and stewardship" https://www.nature.com/articles/sdata201618 ↩↩
See also rrids.org for a broad-based community index of research resources. ↩
See the Three Key Themes section of the NIH Workshop on the Role of Generalist Repositories to Enhance Data Discoverability and Reuse: Workshop Summary ↩
See the NIH Data Science Strategy and "Implementing the NIH Data Management and Sharing Policy." ↩