Skip to content

Data Model

Work in progress

Thanks for stopping by. Work on the Bioconnect platform is ongoing and we are happy to share our progress in the open. Below you will find a mostly complete description of the ISA data model that is the foundation of our organizational structure for search indexing. This is still a work in progress. Please check back to see how we are doing, or visit our discussion site to provide feedback or ask questions.

The following sections outline in detail the standard data models that JAX BioConnect employs for describing research-related metadata.

Data generation metadata

For cross-species experimental and observational research data in BioConnect, we employ the Investigation, Study, Assay (ISA) data model for describing minimal metadata. The following sections outline the concept classes in the ISA model, their relationships, and their attributes. ISA is an extensible model - it has a strict structure, but terms and comments can be added to describe parts of an experiment with critical terms in a controlled way.

Biological data analysis databases, tools, libraries, and hand-crafted scripts are all part of the research process, and should also be documented to support reproducibility. The ISA model accounts for this in the Process section.

Component parts of ISA

Figure 1

Figure 1: The interconnected modules of study organization, physical properties, and processes connect to each other through their artifacts. These artifacts are all annotated according to ontological and terminological standards maintained by the global biomedical research community. The annotation framework also allows for extending the ISA model. Arrows mean "part of"; child entities point to parent entities.

Study organization

Each section below gives a definition for the entity followed by a list of attributes. All entities contain a unique identifier.

Figure 2

Figure 2: Study organization. Arrows mean "part of"; child entities point to parent entities.

Investigation

An investigation is composed of one or more Study entities and describes experiments that cross more than one cohort of test subjects. This entity is meant as a flexible way to group together studies under some conceptual relationship. It can be applied as investigators and curators see fit.

  • Identifier
  • External identifier (An identifier given from an external system)
  • Title
  • Description
  • Submission date
  • Public release date
  • Contacts
  • Publications
  • Comments

Study

A study is composed of one or more Assay entities and describes one or more experiments involving one and only one cohort of test subjects. A study may belong to an Investigation, but does not have to.

Assay

An assay is a measurement applied to one and only one cohort (set of sources, e.g., mice) of test subjects that are recorded as Materials derived from Samples derived from Source individuals. The output of an assay is a Data entity. Data objects are generally gathered together as the output of an assay. Some assays will contain multiple measurements (e.g., metabolic cages).

  • Identifier
  • Study
  • Filename
  • Measurement type (OBI)
  • Technology type
  • Technology platform
  • Characteristics (See Characteristic.)
  • Units
  • Process sequence (See Process.)
  • Samples (See Sample.)
  • Other materials (See Material.)
  • Comments (See Comment.)

Contact

A contact is a person with knowledge of the entity that they are associated with. A contact entity is not a system user entity, even if they both refer to the same physical human.

Publication

A publication is the primary artifact of academic research containing representations of knowledge gained from an experiment.

  • PubMed identifier
  • Document object identifier (DOI)
  • Title
  • Author list
  • Status (See Ontology Annotation.)

Annotation and extension

The ISA model is extensible with additional annotation and extension attributes.

Figure 3

Figure 3: Annotation of elements and extending the model to cover additional concepts.

Factor

"A factor corresponds to an independent variable manipulated by the experimentalist with the intention to affect biological systems in a way that can be measured by an assay. The value of a factor is given in the Study or Assay file, accordingly."

Factor type

A factor type is a class of variables for which we use a controlled vocabulary or ontology as a label. This is an attribute of a given factor, not a model element itself.

Factor Value

Where a Factor is a label for an aspect of an experiment under study, the value is a term that specializes the label. Both tend to be qualitative and selected from a controlled vocabulary.

Characteristic

Characteristics, characteristic categories, characteristic types, and categories are all attributes referenced in the Physical properties section. Each mapping is shown in the following list:

  • Characteristic = material attribute value (in Material, Source, Sample)
  • Characteristic category = material attribute (in Study, Assay)
  • Characteristic type = ontology annotation (in Material Attribute)
  • Category = material attribute (in Material Attribute Value)

Comment

Comments act like "tags," which are familiar from sites like Stack Overflow, and provide a low-fidelity way of adding attributes that are not included in the base ISA model. Comments allow for supplementary annotation of many other elements in the data model. These include: Source, Sample, Contact, Investigation, Study, and Assay,

  • Name
  • Value

Ontology Source Reference

Ontologies, and other controlled vocabularies, that are developed and maintained by the biomedical research community are the foundation of the unified indexing scheme of BioConnect and its broader ecosystem. The ISA model points to the URI's of these resources through ontology source references. The system also imports these for term annotation use by curators.

  • Description
  • File
  • Name
  • Version

Ontology Annotation

Ontology annotations, as described in the Ontology Source Reference above, are defined by experts in biomedical terminology and definition. Ontologies in general are organized in a graph-based hierarchy with no loops, i.e., a directed acyclic graph (DAG). BioConnect also uses this model to store controlled vocabularies that lack this graph-like structure, but none-the-less restrict the selection of term labels. Strain nomenclature from the Mouse Genome Informatics (MGI) is one example of a controlled vocabulary.

  • Annotation value
  • Term source (See Ontology Source Reference.)
  • Term accession (The identifier in the source ontology.)
  • Term accession URL (A fully qualified URI or URL linking to the term in the source ontology.)

Physical properties

Figure 4

Figure 4: Organization of physical property concepts. Arrows mean "part of"; child entities point to parent entities.

Material

A material describes types of materials consumed or produced during the execution of an experiment. A material entity describes how a source (person, mouse) is associated with the sample material under assay (e.g., whole source for the "body weight" assay).

  • Name
  • Type (choices are "name" or "label name")
  • Derives from (optional, references another Material entity)
  • Characteristics (See Material Attribute Value.)

Material Attribute

Material attributes are also referred to as "characteristics" and describe physical entities.

Material Attribute Value

Similar to Factor Value entities, this value specializes the material entity type.

Source

A source is a type of Material that represents the starting biological material, i.e., the individual donating the material, under study.

Sample

Like Source, a sample is a type of Material. A sample is extracted from a Source according to procedures outlined in a Protocol. Think of a Material as a particular Sample drawn from a particular Source.

Process components

Figure 5

Figure 5: Process components. Arrows mean "part of"; child entities point to parent entities.

Process

A process represents either a wet-lab protocol for material processing or a dry-lab protocol for processing derived data from primary sources.

  • Name
  • Execute protocol
  • Performer (a person, but not necessarily a Contact.)
  • Date
  • Previous process (we do not use this attribute because it precludes representing a process as a DAG)
  • Next process (See Process.)

Process Input

A process input can refer to either a physical Material or Source or Sample, or a file-like Data object.

  • Process (that produced the output, see Process)
  • Source (output artifact, see Source)
  • Sample (output artifact, see Sample)
  • Data (output artifact, see Data)
  • Material (output artifact, see Material)

Process Output

A process output can refer to either a physical Material or Sample, or a file-like Data object.

  • Process (that produced the output, see Process)
  • Sample (output artifact, see Sample)
  • Data (output artifact, see Data)
  • Material (output artifact, see Material)

Process Parameter Value

When a variable input to a process is set to a particular value, it should be recorded as a process parameter value.

Protocol

The physical procedural steps applied in the run of an Assay and the computational procedural steps applied to produced derived Data are the two examples of a protocol in the ISA model.

Protocol Parameter

The variables that are available as input to a protocol are protocol parameters. These are the labels for process or protocol instance specific values.

Data

Data elements represent the values measured during an Assay activity and are the output of a computational protocol in a process. A data object is generally file-like, but could also be something like a database or directory of files.

  • Name
  • Type (See Ontology Annotation)
  • Generated from (See Sample)
  • Assay (See Assay)
  • File type (See Ontology Annotation)
  • File (Link to a file location)