Uncovering colorectal cancer risk factors lurking in the genome

From the Peters and Hsu Groups, Public Health Sciences Division

Colorectal cancer (CRC) is one of the most common and deadly cancers worldwide. In the U.S., it is the third-leading cause of cancer death in both men and women. Sporadic CRC, which develops without the presence of high-risk genetic mutations, comprises the majority of cases. It is well established that genetic, lifestyle, and environmental factors together contribute to an individual’s overall risk for the disease. Among these contributors, it is estimated that genetic factors explain about a third of the risk, yet most specific genetic signals that confer this risk remain unknown. Fred Hutch researchers recently published a study in the journal Nature Genetics where they report the identification of forty new genetic signals that are significantly associated with risk for sporadic CRC. The study was led by Drs. Ulrike Peters and Li Hsu, members in the Division of Public Health Sciences Division, and members of their research groups.

Prior to this new study, approximately 60 genetic loci associated with CRC risk were known. To uncover new signals, the authors conducted genetic sequencing, imputation, genotyping, and statistical analyses to conduct the largest genome-wide association study (GWAS) meta-analysis for CRC to date. This tour de force required substantial coordinated efforts across many studies and research teams, as the authors utilized data available from forty-five studies conducted in numerous countries around the world.

To start, the authors conducted whole genome sequencing of 2,159 individuals (1,439 CRC cases and 720 controls) and analyzed over 14 million genetic variants in the dataset. These data, together with sequencing data from 32,488 individuals in the Haplotype Reference Consortium, served as references for subsequent imputation analyses. Genetic imputation is a statistical method that allows researchers to infer missing genetic sequences using a reference population that has more complete sequencing data. The authors then conducted a GWAS meta-analysis in which they imputed data in nearly 64,000 individuals. Results from this “stage 1” analysis were then used to inform the design of a custom genotyping panel with an emphasis on potential CRC risk variants. Genotyping data were generated for nearly 24,000 individuals using this custom designed panel. In the next step, the authors conducted a “stage 2” meta-analysis in which the newly genotyped data were combined with existing GWAS data for a total of approximately 61,500 individuals.

Graphical representation of combined colorectal cancer risk genome-wide association study meta-analysis.
Combined colorectal cancer risk genome-wide association study meta-analysis. Association results of genetic variants organized by chromosomal location. Newly identified genetic loci are in orange; previously identified loci are in green. Genome-wide significance level is denoted by the orange dotted line. Image provided by Dr. Jeroen Huyghe

The researchers then performed a combined GWAS meta-analysis that included cases and controls from the stage 1 and 2 analyses. This combined analysis of more than 125,000 individuals led to the identification of thirty new genetic loci associated with CRC risk (see Figure). One of the most exciting findings was the discovery of a rare variant that was strongly associated with reduced risk for CRC. This variant is located near CHD1 and RGMB, two genes that have previously been implicated in cancer risk. Of the other signals identified, they implicated Krüppel-like factors, Hedgehog signaling, Hippo-YAP signaling, long noncoding RNAs and somatic drivers, and support a role for immune function. The conduct of conditional analyses, tests to determine whether signals are independent of other nearby signals, identified an additional ten new loci.

The potential for the clinical use and public health impact of the now ~100 known genetic risk factors was revealed through the conduct of a polygenic risk score (PRS) based on the genetic loci. The authors calculated the recommended age at which individuals should begin CRC screening dependent on the PRS. A risk threshold was set as the average 10-year CRC risk for a 50 year old individual. The difference in age at which individuals at the highest and lowest 1% of risk should begin screening was 18 years for men and 24 years for women.

An important point to note in the polygenic risk score calculation and in the study as a whole is that the study population is predominantly of European descent and the results may not translate to populations of other races and ethnicities. Planned future studies will address this issue, as the authors will include individuals from more diverse ethnic backgrounds.

Fred Hutch/UW Cancer Consortium members Ulrike Peters, Li Hsu, Christopher Carlson, William Grady, Charles Kooperberg, Christopher Li, Polly Newcomb, Ross Prentice, Catherine Tangen, Emily White, and John Potter contributed to this research.

This research was supported by the National Institutes of Health.

Huyghe JR, Bien SA, Harrison TA, et al. 2018. Discovery of common and rare genetic risk variants for colorectal cancer. Nature Genetics. doi: 10.1038/s41588-018-0286-6