Technical Use Cases
Connecting into the ecosystem
In the course of gathering ideas for the Cube Initiative we found that there is a handful of capabilities that are required for all web-accessible digital resources for computational biology. These include:
- A web-accessible point of entry, which could be a website or API, that is externally available.
- Security, authentication, authorization, access auditing.
- Monitoring for both security breaches and system failures.
- A continuous delivery process that deploys software to a compute environment with no fuss.
- A search system that operates over all connected resources. These resources are curated and annotated to a common data model.
- Workflow orchestration for starting, monitoring, and providing status notifications for long-running processes.
- Data orchestration for sharing data objects between processes.
Here we will describe a standard process for adopting and taking advantage of these capabilities in your scientific software project.
Use cases
The BioConnect Ecosystem provides a standard compute environment and security all in a place that promotes the FAIR principles. This ecosystem aims to make the process of deploying scientific software and data resources as quick and painless as possible. The following examples are two different use cases for this process.
Sharing a workflow
- A research lab devises a new approach to connecting genetic variation in the mouse to genetic variation in human populations.
- Any input data required for the workflow that are already in BioConnect are annotated according to a well-documented standard, and an application programming interface (API)provides consistent access to the data in a computable format.
- A research scientist writes an initial workflow pipeline, e.g., in WDL or Nextflow, on her laptop. All references to data objects are able to use a file-like interface to the data orchestration system, organized by namespaces. This removes the need to track down specific file locations.
- Once the pipeline is complete and tested it is deployed to a code repository that is connected to the ecosystem. From there, it is automatically deployed.
- The deployed code is then manually mapped in an API gateway. This gateway is the primary point of entry and documentation for all APIs. This new API endpoint is then annotated in the metadata service to describe its inputs and outputs in biological context.
- Now, researchers beyond the originator, even those outside of JAX, can run the pipeline. They can select input data sets, e.g., mouse ATAC peaks and human ATAC peaks (either controlled or non-controlled), and run the pipeline and retrieve the finished results.
Deploy a database of biological resources
- A research lab maintains an existing database as a community resource for biological data. The lab can hand over operation of the infrastructure to BioConnect while retaining control of the application and database.
- Handing over operations is a multi-step process with four key components to be transitioned: frontend, backend, database, and the site's URL address.
- The client-side code, i.e., the website, is an Angular single-page application. It is developed as a "micro-frontend." This means that there is a base application, and research applications may write plugins that connect to their own backend services. The base application provides login, identity management, site navigation and search (across plugged in services). Plugins clearly indicate the responsible research lab and applicable grants for funding agencies to verify.
- Server-side code that receives requests from the client and runs database queries, will transition to a containerized application. In the ecosystem, these applications will be accessed via well documented API endpoints. Please see use case 1 for additional details. This backend code will run in a container orchestration environment like Kubernetes in a cloud environment. Security, access authorization, monitoring, and horizontal scaling are all centrally managed. API endpoints clearly indicate the responsible research lab and applicable grants for funding agencies to verify.
- Databases are the lifeblood of computational biology. As such, they must be treated as critical components. Transitioning to a cloud environment will require creating a full backup of the existing database, and then standing up a new cloud instance. To save synchronization headaches this is to be done during an update-frozen maintenance window. For databases that require a large persistent volume there are special considerations.
- During the transition process the original database application will continue to run as always and users will access the site via its usual URL, e.g., https://original.jax.org. During the transition, the new site will be accessible only within the JAX firewall at an address like https://connect.jax.org/original. Once the new site is ready to receive users, the https://original.jax.org address will be reset to point to the new content and redirected to the BioConnect URL. This re-pointing process may be applied to the whole site, or to components as they become available.
- Operations and maintenance of the BioConnect Ecosystem is the responsibility of JAX institutional staff. Updates, bug fixing, enhancement, and user support is managed by CompSci and research labs.
Connecting an independent resource
- A research lab has a tool or database that they would like to maintain as an independent system. This could be either a new or an existing web application or API.
- Client-side (frontend) web applications can be described and annotated with tags. This allows the application to be searched for in BioConnect, even if it is external to the ecosystem. The elements described by the base ISA model, which the metadata service employs, describe the components of studies, assays, processes, samples relating to the work of biological investigations. As such, there is not a clear correlate to client-side web applications. However, a reasonably constrained tagging system would provide a reasonable set of searchable descriptors for web applications. See https://stackoverflow.com/help/tagging for a good example of how Stack Overflow guides the use and creation of tags for questions. Of course our case would be for biological web apps.
- Server-side (backend) APIs that are external to the ecosystem can be annotated and mapped just like those in use case 2. API endpoint inputs and outputs can be assigned to an element, element attribute, or an attribute value in the ISA model. That should cover a lot of common API cases. Where it doesn't, we can either add to the ISA model (via the metadata service), or modify the endpoint. We are assuming that endpoints come in two varieties: list and details. List endpoints would generally map to elements or attributes as classes. Detail endpoints would map to instances.
Using the Task and Messaging modules together
The Core Modules