Using Low Coverage Whole Genome Sequencing (lcWGS) to Calculate Accurate Polygenic Scores Across…

Over the past year, we have worked closely with Drs. Sekar Kathiresan and Amit V. Khera to launch the Color Genetic Score Study, a research effort that makes polygenic scores available to Color participants for the first time. Recently, the peer-reviewed journal BMC Genome Medicine published our paper describing our unique approach to calculating polygenic scores that uses a technology called low coverage whole genome sequencing (lcWGS).

Polygenic scores are a new kind of genetic analysis that integrate information from numerous common variants in DNA to assess a person’s inherited risk for a specific trait or disease. Polygenic scores and other applications of statistical genetics, such as genetic ancestry, are traditionally calculated using a technology called genotyping arrays. Genotyping arrays are considered to be the current gold standard in the field and are relatively inexpensive and time-efficient. However, genotyping arrays are limited because they only capture a predetermined number of variants in DNA, most of which were identified in studies of people of non-Hispanic, European ancestry.

lcWGS is just as inexpensive and time-efficient as genotyping arrays but doesn’t have these inherent biases. To determine if lcWGS data can be used to calculate polygenic scores, we developed and validated a bioinformatics pipeline for imputation. Imputation is a process that takes data from lcWGS or genotyping arrays and uses statistics to predict common variants in DNA.

We first simulated lcWGS using publicly-available data from Genome in a Bottle and the 1000 Genomes Project, which are two independent research efforts to sequence people of different ancestries around the world. In this experiment, we selected 10 samples to create and represent diverse groups of ancestries, including Ashkenazi Jewish, Chinese, Northern and Western European, Columbian, Gujarati Indian, Italian, and Nigerian. We found that our bioinformatics pipeline can reliably detect and predict (r2 > 90%) common DNA variants in these samples.

We then did a direct comparison of polygenic scores calculated from genotyping arrays and those calculated from lcWGS in 184 people of European ancestry. Specifically, we calculated polygenic scores for three conditions (coronary artery disease, breast cancer, and atrial fibrillation) that were previously published. As expected, we found lcWGS can calculate polygenic scores as well as genotyping arrays (r2 = 0.98, 0.93, and 0.97, respectively).

GPS, genome-wide polygenic score. CAD, coronary artery disease. BC, breast cancer. AF, atrial fibrillation. Adapted from Homburger et al. 2019, Figure 3.

We also performed lcWGS (real, not simulated this time) on 116 samples from people of Chinese, Yoruba, Indian, African-American, Mexican, and Puerto Rican ancestry. We again found that our bioinformatics pipeline can reliably detect and predict (r2 > 90%) common variants in DNA and accurately calculate polygenic scores when compared to known data from the 1000 Genomes Project (r2 = 0.98 for coronary artery disease, 0.91 for breast cancer, and 0.98 for atrial fibrillation).

CHB, Han Chinese. GIH, Gujarati Indian. YRI, Nigerian. ASW, African. MXL, Mexican. PUR, Puerto Rican. GPS, genome-wide polygenic score. CAD, coronary artery disease. BC, breast cancer. AF, atrial fibrillation. 1KGP, 1000 Genomes Project. Adapted from Homburger et al 2019, Figure 4.

Finally, we calculated polygenic scores for coronary artery disease, breast cancer, and atrial fibrillation using lcWGS in 11,502 people of European ancestry and found that they can identify people with an increased risk of disease. Importantly, these results are consistent with previous studies that used genotyping arrays to calculate polygenic scores.

Our paper demonstrates that lcWGS can reliably detect and predict common variants in DNA as well as accurately calculate polygenic scores, regardless of a person’s genetic ancestry — an important step forward in health equity. The use of lcWGS may also help drive polygenic score adoption in health systems: lcWGS can be performed on the same equipment as targeted multi-gene panels, which is a sequencing technology already used by many clinical laboratories, including Color, to identify rare, single variants in DNA that increase monogenic risk for disease, like BRCA1.

We, with others in the research community, are now working to understand the combined effect of monogenic and polygenic risk and whether they can be used in combination to give us a more comprehensive understanding of genetic risk for complex diseases. Color’s mission is to help everyone live a healthier life, and this research gets us that much closer.

About Color
Color is the leader in delivering precision healthcare through cutting-edge technology. Color makes data-driven health programs such as clinical genetics accessible, convenient, and cost-effective for everyone. Color partners with leading health systems, premier employers, and national health initiatives around the world including the million-person All of Us program by the National Institutes of Health. For more information about Color, visit

Tags: , , , ,