Open Source at Color: A Statistical Risk Model for Cancer
10–15% of most cancers (in women and men) can be explained by inherited genetic mutations — which is why we built the Color Test, which analyzes 30 genes which are clinically relevant to the most common hereditary cancers.
Separate from genetics, a person might still be at higher risk of cancer based on their family history. Researchers have developed and published a number of statistical models to assess breast cancer risks of individuals based on their family and personal history. Here at Color, we incorporate these models — together with genetic results — to provide personalized screening guidelines, according to NCCN Guidelines.
One widely adopted model was developed by Dr. Elizabeth B. Claus and first published in 1994. Based on population-level surveys, the Claus Risk Model includes 4,730 women who developed breast cancer between ages 20 and 54, and estimates risk based on a patient’s family history, including the age of cancer onset of the family member and that family member’s relationship to the patient.
There are numerous software applications and programs which implement the Claus Risk Model, but surprisingly, no public open-source code exists. We’ve fixed that by implementing and validating an open-source Claus Risk Model in Python, including a few dozen automated test cases. We manually validated the model by comparing its output for 8,065 individuals who were manually input to a reference implementation. Zero errors or disagreements were found, apart from a handful of human data-entry errors during the validation which were later corrected.
This code allows us to easily scale and quickly apply risk models to all of our clients. Perhaps more importantly, future researchers and software developers won’t need to re-invent the wheel, as our source code is publicly available on GitHub for all to use. Contributions and improvements welcome!
Much of our software at Color relies on open source packages and binaries, which helps us move faster by building on the work of those who came before us. Our Claus Risk Model code is just one small contribution back to the community; we’ve also published a structural variant simulator and a few other handy tools which we hope will be useful to researchers and companies alike.Tags: Cancer, Colors, Engineering, Genetics, Open Source