Background, alternatives, and the data governance landscape
The problems we are addressing with BioConnect are not new. Research teams around the research world have been working on some aspect of this problem for many years. This project prioritizes a mix of capabilities that are relevant to The Jackson Laboratory. Other projects have focused on different or more constrained challenges. Regardless of differences scope, there are a lot of excellent projects in the data governance and data science domain that we can learn from or incorporate.
Alternative platforms and related software
Project/Description | Comments |
---|---|
https://usegalaxy.org Web-based platform for data intensive biomedical research |
Active since 2005. Many integrations, but some are broken. Metadata model is ad hoc. |
https://www.synapse.org/ Organize your digital research assets. Get credit for your research. Collaborate with your colleagues and the public. | |
https://www.elucidata.io/ Polly is a cutting-edge MLOps platform built on a modern tech stack for storing, curating and managing ML-ready biomolecular data. | |
https://www.catalyticds.com/ The Catalytic Platform integrates and scales the scientific workflows required to create new insights and achieve R&D milestones faster. | |
https://irods.org Integrated Rule-Oriented Data System provides data virtualization, discovery, workflows, and secure collaboration. |
Active since 1996. Does not run in the cloud natively. |
https://odpi.github.io/egeria-docs/ Open metadata and governance for enterprises - automatically capturing, managing, and exchanging metadata between tools and platforms, no matter the vendor. |
Under development by an IBM group in the UK. Similar in many ways to our aspirations. Not specifically created for biology. Uses the same technology stack as BioConnect. |
https://metadatacenter.org/ Center for Expanded Data Annotation and Retrieval (CEDAR) is an end-to-end process that enables community-based organizations to collaborate to create metadata templates. |
This is a customizable curation platform for defining controlled vocabularies or schemas. It could be used, for example, to create a template that supplies ISA metadata or input to the Study Intake Process. |
https://dataverse.org/ Establish a research data management solution for your community. Federate with a growing list of Dataverse repositories worldwide for increased discoverability of your community’s data. Participate in the drive to set norms for sharing, preserving, citing, exploring, and analyzing research data. |
This comes from the social sciences, but it is grounded on the same data governance principles as BioConnect. |
https://app.terra.bio/# Cloud-native platform for biomedical researchers to access data, run analysis tools, and collaborate. |
Provides access to data via connectors to public databases. Integrations with jupyter, RStudio, Galaxy, WDL workflows. Metadata is ad hoc. Also see: https://terra.bio/how-terra-fits-within-the-anvil-ecosystem/ |
https://ga4gh.github.io/data-repository-service-schemas/ The Data Repository Service (DRS) API provides a generic interface to data repositories so data consumers, including workflow systems, can access data in a single, standardized way regardless of where it’s stored or how it’s managed. |
This system, part of the GA4GH Cloud Work Stream, is a standard way for different bioinformatic systems to communicate and share data. If exposed as a publicly accessible system, BioConnect users would benefit from this interface. |
https://learning.cyverse.org The Open Science Workspace for Collaborative Data-driven Discovery |
Similar to Galaxy |
https://www.radc.rush.edu/ Rush Alzheimer's Disease Center (RADC) Research Resource Sharing Hub |
|
https://www.sanger.ac.uk/science/ Database of data and software tools for biology and genetics. |
|
https://zenodo.org/ Community code and data repository |
|
https://core.ac.uk/services/ Aggregate all open access research outputs from repositories and journals worldwide and make them available to the public |
|
https://codeocean.com/ One place for an integrative computational research experience |
|
https://www.dnanexus.com/ Platform for scientific collaboration and accelerated discovery |
|
https://genestack.com/products/omics-data-manager/ Integrate, harmonize and search Life Science Data from multiple sources making it findable, accessible, interoperable and reusable. |
|
https://www.project-redcap.org/ REDCap is a secure web application for building and managing online surveys and databases. While REDCap can be used to collect virtually any type of data in any environment (including compliance with 21 CFR Part 11, FISMA, HIPAA, and GDPR), it is specifically geared to support online and offline data capture for research studies and operations. |
Alternative data models and comparison
Short name | Long name | URL of definition | Notes |
---|---|---|---|
ISA | Investigation, Study, Assay | https://isa-specs.readthedocs.io/en/latest/isajson.html | |
MIAME | Minimum information about microarray experiment | https://pubmed.ncbi.nlm.nih.gov/17087822/ | This paper is for MAGE-TAB which is a predecessor to ISA |
CONSORT | CONSOlidated standards of Reporting Trials | http://www.consort-statement.org/consort-2010 | |
ARRIVE | Animal Research: Reporting of In Vivo Experiments | https://arriveguidelines.org/arrive-guidelines | |
OMOP CDM | Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) | http://ohdsi.github.io/CommonDataModel/cdm54.html | See FACT_RELATIONSHIP |
FORCE 11 | Future of Research communications and e-Scholarship | https://www.force11.org/datacitationprinciples | FORCE11 is not a metadata standard. Rather it is "a set of guiding principles for data within scholarly literature, another dataset, or any other research object." |
PRISMA | Preferred Reporting Items for Systematic Reviews and Meta-Analyses | http://www.prisma-statement.org/ | For literature reviews and meta-analyses |
DataCite | DataCite Metadata Schema | https://schema.datacite.org/meta/kernel-4.4/ | Handles higher level context. Could be an intermediate step between Dublin Core and ISA |
Bioschemas | Encourage the use of schema.org in biology | https://bioschemas.org/profiles/index | |
GDC Data Model | National Cancer Institute Genomic Data Commons Data Model | https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?_top=1 | Excellent model. Human only. |
TIM | Terra Interoperability Model | https://datamodel.terra.bio/TerraCoreDataModel.html | Not including because it is still a work in progress. |
Human Cell Atlas Metadata | Human Cell Atlas Metadata | https://data.humancellatlas.org/metadata | Human and mouse, but cell atlas specific |
ENCODE | Encyclopedia of DNA Elements | https://www.encodeproject.org/data-standards/ |