Data Model
Work in progress
Thanks for stopping by. Work on the Bioconnect platform is ongoing and we are happy to share our progress in the open. Below you will find a mostly complete description of the ISA data model that is the foundation of our organizational structure for search indexing. This is still a work in progress. Please check back to see how we are doing, or visit our discussion site to provide feedback or ask questions.
The following sections outline in detail the standard data models that JAX BioConnect employs for describing research-related metadata.
Data generation metadata
For cross-species experimental and observational research data in BioConnect, we employ the Investigation, Study, Assay (ISA) data model for describing minimal metadata. The following sections outline the concept classes in the ISA model, their relationships, and their attributes. ISA is an extensible model - it has a strict structure, but terms and comments can be added to describe parts of an experiment with critical terms in a controlled way.
Biological data analysis databases, tools, libraries, and hand-crafted scripts are all part of the research process, and should also be documented to support reproducibility. The ISA model accounts for this in the Process section.
Component parts of ISA
Figure 1: The interconnected modules of study organization, physical properties, and processes connect to each other through their artifacts. These artifacts are all annotated according to ontological and terminological standards maintained by the global biomedical research community. The annotation framework also allows for extending the ISA model. Arrows mean "part of"; child entities point to parent entities.
Study organization
Each section below gives a definition for the entity followed by a list of attributes. All entities contain a unique identifier.
Figure 2: Study organization. Arrows mean "part of"; child entities point to parent entities.
Investigation
An investigation is composed of one or more Study
entities and describes experiments that cross more than one cohort of test subjects. This entity is meant as a flexible way to group together studies under some conceptual relationship. It can be applied as investigators and curators see fit.
- Identifier
- External identifier (An identifier given from an external system)
- Title
- Description
- Submission date
- Public release date
- Contacts
- Publications
- Comments
Study
A study is composed of one or more Assay
entities and describes one or more experiments involving one and only one cohort of test subjects. A study may belong to an Investigation
, but does not have to.
- Identifier
- External identifier
- Investigation
- Title
- Description
- Submission date
- Public release date
- Design type (See Ontology Annotation. By default, this uses terms from the Ontology for Biomedical Investigations (OBI))
- Factor name
- Factor type (See Ontology Annotation.)
- Process sequence
- Publications
- Contacts
- Design descriptors (See Ontology Annotation.)
- Protocols
- Factors
- Characteristic categories (See Material Attribute.)
- Unit categories (See Ontology Annotation.)
- Samples
- Sources
- Other materials (See Material.)
- Comments
Assay
An assay is a measurement applied to one and only one cohort (set of sources, e.g., mice) of test subjects that are recorded as Materials
derived from Samples
derived from Source
individuals. The output of an assay is a Data
entity. Data objects are generally gathered together as the output of an assay. Some assays will contain multiple measurements (e.g., metabolic cages).
- Identifier
- Study
- Filename
- Measurement type (OBI)
- Technology type
- Technology platform
- Characteristics (See Characteristic.)
- Units
- Process sequence (See Process.)
- Samples (See Sample.)
- Other materials (See Material.)
- Comments (See Comment.)
Contact
A contact is a person with knowledge of the entity that they are associated with. A contact entity is not a system user entity, even if they both refer to the same physical human.
- First name
- Last name
- Middle initial
- Email, phone, fax, address
- Affiliation (See Ontology Annotation.)
- Comments
- Roles (See Ontology Annotation.)
Publication
A publication is the primary artifact of academic research containing representations of knowledge gained from an experiment.
- PubMed identifier
- Document object identifier (DOI)
- Title
- Author list
- Status (See Ontology Annotation.)
Annotation and extension
The ISA model is extensible with additional annotation and extension attributes.
Figure 3: Annotation of elements and extending the model to cover additional concepts.
Factor
"A factor corresponds to an independent variable manipulated by the experimentalist with the intention to affect biological systems in a way that can be measured by an assay. The value of a factor is given in the Study or Assay file, accordingly."
Factor type
A factor type is a class of variables for which we use a controlled vocabulary or ontology as a label. This is an attribute of a given factor, not a model element itself.
Factor Value
Where a Factor
is a label for an aspect of an experiment under study, the value is a term that specializes the label. Both tend to be qualitative and selected from a controlled vocabulary.
- Sample
- Category (See Factor.)
- Value (See Ontology Annotation.)
- Value text
- Unit
Characteristic
Characteristics, characteristic categories, characteristic types, and categories are all attributes referenced in the Physical properties section. Each mapping is shown in the following list:
- Characteristic = material attribute value (in
Material
,Source
,Sample
) - Characteristic category = material attribute (in
Study
,Assay
) - Characteristic type = ontology annotation (in
Material Attribute
) - Category = material attribute (in
Material Attribute Value
)
Comment
Comments act like "tags," which are familiar from sites like Stack Overflow, and provide a low-fidelity way of adding attributes that are not included in the base ISA model. Comments allow for supplementary annotation of many other elements in the data model. These include: Source
, Sample
, Contact
, Investigation
, Study
, and Assay
,
- Name
- Value
Ontology Source Reference
Ontologies, and other controlled vocabularies, that are developed and maintained by the biomedical research community are the foundation of the unified indexing scheme of BioConnect and its broader ecosystem. The ISA model points to the URI's of these resources through ontology source references. The system also imports these for term annotation use by curators.
- Description
- File
- Name
- Version
Ontology Annotation
Ontology annotations, as described in the Ontology Source Reference
above, are defined by experts in biomedical terminology and definition. Ontologies in general are organized in a graph-based hierarchy with no loops, i.e., a directed acyclic graph (DAG). BioConnect also uses this model to store controlled vocabularies that lack this graph-like structure, but none-the-less restrict the selection of term labels. Strain nomenclature from the Mouse Genome Informatics (MGI) is one example of a controlled vocabulary.
- Annotation value
- Term source (See Ontology Source Reference.)
- Term accession (The identifier in the source ontology.)
- Term accession URL (A fully qualified URI or URL linking to the term in the source ontology.)
Physical properties
Figure 4: Organization of physical property concepts. Arrows mean "part of"; child entities point to parent entities.
Material
A material describes types of materials consumed or produced during the execution of an experiment. A material entity describes how a source (person, mouse) is associated with the sample material under assay (e.g., whole source for the "body weight" assay).
- Name
- Type (choices are "name" or "label name")
- Derives from (optional, references another
Material
entity) - Characteristics (See Material Attribute Value.)
Material Attribute
Material attributes are also referred to as "characteristics" and describe physical entities.
- Characteristic type (See Ontology Annotation.)
Material Attribute Value
Similar to Factor Value
entities, this value specializes the material entity type.
- Value (See Ontology Annotation.)
- Value text (non-controlled term value)
- Category (See Material Attribute.)
- Unit
Source
A source is a type of Material
that represents the starting biological material, i.e., the individual donating the material, under study.
- Name
- Characteristics (See Characteristic.)
- Comments (See Comment.)
Sample
Like Source
, a sample is a type of Material
. A sample is extracted from a Source
according to procedures outlined in a Protocol
. Think of a Material
as a particular Sample
drawn from a particular Source
.
- Name
- Characteristics (See Ontology Annotation.)
- Derives from (See Source.)
- Comments
Process components
Figure 5: Process components. Arrows mean "part of"; child entities point to parent entities.
Process
A process represents either a wet-lab protocol for material processing or a dry-lab protocol for processing derived data from primary sources.
- Name
- Execute protocol
- Performer (a person, but not necessarily a Contact.)
- Date
- Previous process (we do not use this attribute because it precludes representing a process as a DAG)
- Next process (See Process.)
Process Input
A process input can refer to either a physical Material
or Source
or Sample
, or a file-like Data
object.
- Process (that produced the output, see Process)
- Source (output artifact, see Source)
- Sample (output artifact, see Sample)
- Data (output artifact, see Data)
- Material (output artifact, see Material)
Process Output
A process output can refer to either a physical Material
or Sample
, or a file-like Data
object.
- Process (that produced the output, see Process)
- Sample (output artifact, see Sample)
- Data (output artifact, see Data)
- Material (output artifact, see Material)
Process Parameter Value
When a variable input to a process is set to a particular value, it should be recorded as a process parameter value.
- Process (See Process)
- Category (See Protocol Parameter)
- Value (See Ontology Annotation)
- Unit (See Ontology Annotation)
Protocol
The physical procedural steps applied in the run of an Assay
and the computational procedural steps applied to produced derived Data
are the two examples of a protocol in the ISA model.
- Name
- Description
- Version
- URI
- Protocol type (See Ontology Annotation)
Protocol Parameter
The variables that are available as input to a protocol are protocol parameters. These are the labels for process or protocol instance specific values.
- Protocol (See Protocol)
- Protocol parameter name (See Ontology Annotation)
Data
Data elements represent the values measured during an Assay
activity and are the output of a computational protocol in a process. A data object is generally file-like, but could also be something like a database or directory of files.
- Name
- Type (See Ontology Annotation)
- Generated from (See Sample)
- Assay (See Assay)
- File type (See
Ontology Annotation
) - File (Link to a file location)