Color’s mission is to democratize access to genetic testing. In order to achieve this, running a cost efficient and scalable service is crucial. From day one, Color was built as a software-driven company, with engineers working side-by-side with all functional teams to ensure the highest level of system automation, which in turn enables fast-paced product innovation, development and iteration.
Once a client purchases a test and a physician orders it, it goes through the lifecycle of shipping, receiving, lab sequencing, data processing, variant classification, sample interpretation, report generation and genetic counseling. Over the past 2 years, our engineering team has built an end-to-end solution that efficiently manages the sample lifecycle with a high level of automation, which enables efficient coordination across teams. Specifically, we have built the following five major systems, all fully integrated.
Laboratory Information Management System (LIMS)
Color operates a state-of-the-art CLIA-certified wet lab to process client samples. LIMS is the software used to manage the lifecycle of physical samples in the lab, in an auditable and automatic way. While we initially evaluated third party enterprise software solutions for LIMS, our need for flexibility and fast iteration speed led us to the path of building our own LIMS in-house. Currently, Color LIMS is being used effectively by multiple teams that handle lab samples, including lab, quality control and support teams.
The bioinformatics pipeline processes DNA sequence data (reads) generated in our lab. These reads are first aligned relative to a reference human genome and then compared to find potential variants for further analysis. This process consists of the amalgamation of data from multiple tools and algorithms to identify and classify variants. Historically, companies have found it extremely challenging to reliably run a bioinformatics pipeline at scale due to the computational complexity and broad diversity of tools/algorithms that are involved. At Color, we have heavily invested in a framework to harness that complexity. Today, we are in a state where the full pipeline runs in a fully automated manner and is trivial to scale with sample volume.
Sample Control System
This is an internal system that manages all sample state transitions post lab and pipeline processing. It supports multiple access levels, allowing different teams to collaborate. Example roles and functionalities supported by this service are:
- Quality control team reviews, and either approves or rejects samples, based on quality metrics computed in various stages of processing;
- Variant scientists review and classify potential variants as benign, pathogenic, or of uncertain significance;
- Pathologists interpret each client sample with all variant and health profile information, compute risk scores and generate personalized reports to release to clients;
- Genetic counselors review all client reports;
- Lab director signs off on client reports;
- Genetic counselors follow up and conduct in-person counseling sessions with clients to go through their reports.
This is the web-based client-facing product that manages the entire client experience, including kit purchase and status tracking, physician ordering, activation, health history entry, final reporting and counseling scheduling. To help clients easily navigate through the testing process, this service monitors and proactively communicates important updates on client status, which empowers clients with knowledge without overwhelming them. The service also supports various types of promotions, including employer benefit programs we have recently launched.
Our data science team tackles a range of challenges whose impact reaches across the company, from machine learning and optimization, to bringing state-of-the-art data viz to genomics, to broad analysis to inform strategy. Topics we’re currently working on include:
- Laboratory and process optimization to reduce turnaround time and lower costs, including partnering with our laboratory scientists to optimize assay design.
- Population analysis to gain novel insights on prevalence of variants across diverse subpopulations and inform our product, growth and partnership strategies.
- Developing statistical methods to increase the power of our variant classification process.
- Implementing new bioinformatics analyses to maximize the amount of information we gain from genetic data.
The nature of the data we process, the high accuracy requirements and health industry regulations all pose unique challenges and requirements for us in building our software platform.
High Quality Results
We strive to provide our clients with results of the highest quality. There is no room for error in any phase of sample processing. For example, variant finding requires both high recall and high precision results, posing interesting challenges in algorithm design and overall data analysis process.
To provide our clients the most trustworthy, stable and reproducible results, every release of the bioinformatics pipeline software goes through a formal result validation process and requires official signoff by our lab director. All sample processing is versioned with the validated software, thus making the entire processing auditable at each step. All changes require extensive testing before releasing any new versions of the production pipeline. Accordingly, we have developed an automated validation framework that makes it convenient and efficient to run extensive validation testing of the entire pipeline.
As a software platform operating in the health space, Color systems are fully compliant with The Health Insurance Portability and Accountability Act of 1996 (HIPAA). Respecting and securing the privacy of our clients’ personal health data is built into all system design decisions. All production access logs are consistently monitored. We also conduct periodic security reviews and tests to stay proactive.
As explained earlier, each client sample goes through a complex life cycle of state transitions, involving processing by different teams and system components. Our system design needs to minimize dependency amongst teams so they can function in parallel as much as possible, while still ensuring the correct level of quality assurance. Client results can vary in distinct, subtle ways, all requiring flexible and scalable system design. Building a fully integrated system has enabled us to track and connect every detail about each sample to ensure a quality experience and efficiency in addressing issues when they arise.
Genetic sequencing can produce millions of reads and gigabytes of data. Matching and comparing this amount of data against the reference genome to find abnormalities is both CPU and memory intensive. Proper algorithm and system design is required to be able to generate accurate results fast and efficiently.
The Color Difference
Color prides itself on its mission to democratize access to genetic testing. Our teamconsists of a diverse set of roles, many of which are uncommon in traditional tech companies. Our engineering team is dedicated to integrating modern software engineering principles with advances in bioinformatics, next generation sequencing and health care by working closely with the best designers, scientists, lab staff, genetic counselors and physicians, all toward the same end goal: providing our clients with accurate and actionable results in the most secure manner.