Skip to content

Background, alternatives, and the data governance landscape

The problems we are addressing with BioConnect are not new. Research teams around the research world have been working on some aspect of this problem for many years. This project prioritizes a mix of capabilities that are relevant to The Jackson Laboratory. Other projects have focused on different or more constrained challenges. Regardless of differences scope, there are a lot of excellent projects in the data governance and data science domain that we can learn from or incorporate.

Project/Description Comments
https://usegalaxy.org
Web-based platform for data intensive biomedical research
Active since 2005. Many integrations, but some are broken. Metadata model is ad hoc.
https://www.synapse.org/
Organize your digital research assets. Get credit for your research. Collaborate with your colleagues and the public.
https://www.elucidata.io/
Polly is a cutting-edge MLOps platform built on a modern tech stack for storing, curating and managing ML-ready biomolecular data.
https://www.catalyticds.com/
The Catalytic Platform integrates and scales the scientific workflows required to create new insights and achieve R&D milestones faster.
https://irods.org
Integrated Rule-Oriented Data System provides data virtualization, discovery, workflows, and secure collaboration.
Active since 1996. Does not run in the cloud natively.
https://odpi.github.io/egeria-docs/
Open metadata and governance for enterprises - automatically capturing, managing, and exchanging metadata between tools and platforms, no matter the vendor.
Under development by an IBM group in the UK. Similar in many ways to our aspirations. Not specifically created for biology. Uses the same technology stack as BioConnect.
https://metadatacenter.org/
Center for Expanded Data Annotation and Retrieval (CEDAR) is an end-to-end process that enables community-based organizations to collaborate to create metadata templates.
This is a customizable curation platform for defining controlled vocabularies or schemas. It could be used, for example, to create a template that supplies ISA metadata or input to the Study Intake Process.
https://dataverse.org/
Establish a research data management solution for your community. Federate with a growing list of Dataverse repositories worldwide for increased discoverability of your community’s data. Participate in the drive to set norms for sharing, preserving, citing, exploring, and analyzing research data.
This comes from the social sciences, but it is grounded on the same data governance principles as BioConnect.
https://app.terra.bio/#
Cloud-native platform for biomedical researchers to access data, run analysis tools, and collaborate.
Provides access to data via connectors to public databases. Integrations with jupyter, RStudio, Galaxy, WDL workflows. Metadata is ad hoc. Also see: https://terra.bio/how-terra-fits-within-the-anvil-ecosystem/
https://ga4gh.github.io/data-repository-service-schemas/
The Data Repository Service (DRS) API provides a generic interface to data repositories so data consumers, including workflow systems, can access data in a single, standardized way regardless of where it’s stored or how it’s managed.
This system, part of the GA4GH Cloud Work Stream, is a standard way for different bioinformatic systems to communicate and share data. If exposed as a publicly accessible system, BioConnect users would benefit from this interface.
https://learning.cyverse.org
The Open Science Workspace for Collaborative Data-driven Discovery
Similar to Galaxy
https://www.radc.rush.edu/
Rush Alzheimer's Disease Center (RADC) Research Resource Sharing Hub
https://www.sanger.ac.uk/science/
Database of data and software tools for biology and genetics.
https://zenodo.org/
Community code and data repository
https://core.ac.uk/services/
Aggregate all open access research outputs from repositories and journals worldwide and make them available to the public
https://codeocean.com/
One place for an integrative computational research experience
https://www.dnanexus.com/
Platform for scientific collaboration and accelerated discovery
https://genestack.com/products/omics-data-manager/
Integrate, harmonize and search Life Science Data from multiple sources making it findable, accessible, interoperable and reusable.
https://www.project-redcap.org/
REDCap is a secure web application for building and managing online surveys and databases. While REDCap can be used to collect virtually any type of data in any environment (including compliance with 21 CFR Part 11, FISMA, HIPAA, and GDPR), it is specifically geared to support online and offline data capture for research studies and operations.

Alternative data models and comparison

Short name Long name URL of definition Notes
ISA Investigation, Study, Assay https://isa-specs.readthedocs.io/en/latest/isajson.html
MIAME Minimum information about microarray experiment https://pubmed.ncbi.nlm.nih.gov/17087822/ This paper is for MAGE-TAB which is a predecessor to ISA
CONSORT CONSOlidated standards of Reporting Trials http://www.consort-statement.org/consort-2010
ARRIVE Animal Research: Reporting of In Vivo Experiments https://arriveguidelines.org/arrive-guidelines
OMOP CDM Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) http://ohdsi.github.io/CommonDataModel/cdm54.html See FACT_RELATIONSHIP
FORCE 11 Future of Research communications and e-Scholarship https://www.force11.org/datacitationprinciples FORCE11 is not a metadata standard. Rather it is "a set of guiding principles for data within scholarly literature, another dataset, or any other research object."
PRISMA Preferred Reporting Items for Systematic Reviews and Meta-Analyses http://www.prisma-statement.org/ For literature reviews and meta-analyses
DataCite DataCite Metadata Schema https://schema.datacite.org/meta/kernel-4.4/ Handles higher level context. Could be an intermediate step between Dublin Core and ISA
Bioschemas Encourage the use of schema.org in biology https://bioschemas.org/profiles/index
GDC Data Model National Cancer Institute Genomic Data Commons Data Model https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?_top=1 Excellent model. Human only.
TIM Terra Interoperability Model https://datamodel.terra.bio/TerraCoreDataModel.html Not including because it is still a work in progress.
Human Cell Atlas Metadata Human Cell Atlas Metadata https://data.humancellatlas.org/metadata Human and mouse, but cell atlas specific
ENCODE Encyclopedia of DNA Elements https://www.encodeproject.org/data-standards/