Use Cases
Research Use Cases
As a systems neuroscientist, I want to investigate how memories are encoded, consolidated and retrieved, so that I can identify the mechanisms by which they get disrupted in Alzheimer's disease.
As a computational researcher I would like to find genotype, functional genomics and cellular phenotype data from genetically diverse mice, so that I can study the influence of genetic variation on molecular traits and connect them to cellular phenotypes.
As a computational research scientist, I want to find genotype and phenotype data on male and female mice with variations in the NBPF1 gene and its regulatory elements, so that I can design experiments to assess whether the copy number of this gene impacts brain size.
Differential gene expression analysis
A user would like to conduct a differential gene expression (DGE) analysis of genes in specific neuronal regions to study the effect of a specific drug. The differentially expressed (DE) genes will be annotated according to cellular component, biological process, and molecular function. User will set up an RNA-Seq assay with the relevant biological samples and enter all the relevant metadata associated with the assay.
Creating a study in the BioConnect Data Curation System
At the completion of the experiment, user (and the authorized members in the study) will receive a unique ID via email and instructions to create a study in the JAX Omics Data Curation Tool to store the metadata and data associated with the assay. The unique ID will allow the user to associate additional data and metadata information to an existing study or to create a new study in the BioConnect data curation system in an automated fashion.
Data QC and preprocessing
The curation tool user interface (UI) will provide the fields to quality control and preprocess raw data from omics assays. In the above-mentioned use case, once the data and metadata files are in the curation tool user will kick off an RNA-Seq workflow. The workflow contains automated pipeline runs starting with the QC the raw reads from the FASTQ files to assess the quality of the sequence reads, followed by preprocessing of raw reads, Mapping RNA-Seq reads onto a reference genome, QC of mapped reads, and provide read coverage for genes.
Analysis
The above preprocessed RNA-Seq data could be used for multiple downstream analyses (differential expression between samples, detecting allele-specific expression, and identifying expression quantitative trait loci (eQTLs), etc.) using different tools in the BioConnect Tool App system.
Technical Use Cases
The BioConnect Ecosystem provides a standard compute environment and security all in a place that promotes the FAIR principles. This ecosystem aims to make the process of deploying scientific software and data resources as quick and painless as possible. The following examples are two different use cases for this process.
Connecting into the ecosystem
There is a handful of capabilities that are required for all web-accessible digital resources for computational biology. These include:
- A web-accessible point of entry, which could be a website or API, that is externally available.
- Security, authentication, authorization, access auditing.
- Monitoring for both security breaches and system failures.
- A continuous delivery process that deploys software to a compute environment with no fuss.
- A search system that operates over all connected resources. These resources are curated and annotated to a common data model.
- Workflow orchestration for starting, monitoring, and providing status notifications for long-running processes.
- Data orchestration for sharing data objects between processes.
Here we will describe a standard process for adopting and taking advantage of these capabilities in your scientific software project.
Sharing a workflow
- A research lab devises a new approach to connecting genetic variation in the mouse to genetic variation in human populations.
- Any input data required for the workflow that are already in BioConnect are annotated according to a well-documented standard, and an application programming interface (API)provides consistent access to the data in a computable format.
- A research scientist writes an initial workflow pipeline, e.g., in WDL or Nextflow, on her laptop. All references to data objects are able to use a file-like interface to the data orchestration system, organized by namespaces. This removes the need to track down specific file locations.
- Once the pipeline is complete and tested it is deployed to a code repository that is connected to the ecosystem. From there, it is automatically deployed.
- The deployed code is then manually mapped in an API gateway. This gateway is the primary point of entry and documentation for all APIs. This new API endpoint is then annotated in the metadata service to describe its inputs and outputs in biological context.
- Now, researchers beyond the originator, even those outside of JAX, can run the pipeline. They can select input data sets, e.g., mouse ATAC peaks and human ATAC peaks (either controlled or non-controlled), and run the pipeline and retrieve the finished results.
Deploy a database of biological resources
- A research lab maintains an existing database as a community resource for biological data. The lab can hand over operation of the infrastructure to BioConnect while retaining control of the application and database.
- Handing over operations is a multi-step process with four key components to be transitioned: frontend, backend, database, and the site's URL address.
- The client-side code, i.e., the website, is an Angular single-page application. It is developed as a "micro-frontend." This means that there is a base application, and research applications may write plugins that connect to their own backend services. The base application provides login, identity management, site navigation and search (across plugged in services). Plugins clearly indicate the responsible research lab and applicable grants for funding agencies to verify.
- Server-side code that receives requests from the client and runs database queries, will transition to a containerized application. In the ecosystem, these applications will be accessed via well documented API endpoints. Please see use case 1 for additional details. This backend code will run in a container orchestration environment like Kubernetes in a cloud environment. Security, access authorization, monitoring, and horizontal scaling are all centrally managed. API endpoints clearly indicate the responsible research lab and applicable grants for funding agencies to verify.
- Databases are the lifeblood of computational biology. As such, they must be treated as critical components. Transitioning to a cloud environment will require creating a full backup of the existing database, and then standing up a new cloud instance. To save synchronization headaches this is to be done during an update-frozen maintenance window. For databases that require a large persistent volume there are special considerations.
- During the transition process the original database application will continue to run as always and users will access the site via its usual URL, e.g., https://original.jax.org. During the transition, the new site will be accessible only within the JAX firewall at an address like https://connect.jax.org/original. Once the new site is ready to receive users, the https://original.jax.org address will be reset to point to the new content and redirected to the BioConnect URL. This re-pointing process may be applied to the whole site, or to components as they become available.
- Operations and maintenance of the BioConnect Ecosystem is the responsibility of JAX institutional staff. Updates, bug fixing, enhancement, and user support is managed by CompSci and research labs.
Connecting an independent resource
- A research lab has a tool or database that they would like to maintain as an independent system. This could be either a new or an existing web application or API.
- Client-side (frontend) web applications can be described and annotated with tags. This allows the application to be searched for in BioConnect, even if it is external to the ecosystem. The elements described by the base ISA model, which the metadata service employs, describe the components of studies, assays, processes, samples relating to the work of biological investigations. As such, there is not a clear correlate to client-side web applications. However, a reasonably constrained tagging system would provide a reasonable set of searchable descriptors for web applications. See https://stackoverflow.com/help/tagging for a good example of how Stack Overflow guides the use and creation of tags for questions. Of course our case would be for biological web apps.
- Server-side (backend) APIs that are external to the ecosystem can be annotated and mapped just like those in use case 2. API endpoint inputs and outputs can be assigned to an element, element attribute, or an attribute value in the ISA model. That should cover a lot of common API cases. Where it doesn't, we can either add to the ISA model (via the metadata service), or modify the endpoint. We are assuming that endpoints come in two varieties: list and details. List endpoints would generally map to elements or attributes as classes. Detail endpoints would map to instances.