A major goal in cancer research is to understand genomic changes that characterize cancer cells. Somatic mutations and DNA aberrations are well-recognized mechanisms that mediate the ability of cancer cells to undergo sustained proliferation. At the molecular level, somatic copy number aberrations (e.g., changes in gene copy number) and epigenetic modifications such as DNA methylation (the attachment of methyl groups to DNA sequences referred to as CpG loci) exert regulation over gene expression and are implicated in the development and progression of cancer. Dr. Wei Sun from the Public Health Sciences Division led a recent study published in Nucleic Acids Research that analyzed the association between copy number variation, DNA methylation, and gene expression in cancer.
The Cancer Genome Atlas (TCGA) was a project conducted through the National Cancer Institute to gain insight into genomic alterations that occur in cancer. Thirty-three different types of cancer were included and large-scale genomic information was generated via several experimental techniques including DNA and RNA sequencing, proteomic profiling, and DNA methylation profiling, among others. Thus, coupled with the availability of powerful -omic technology, TCGA has proven to be a significant resource for the cancer research community. “Multiple types of -omic data can be measured in the same set of samples. As this type of datasets become more popular, integrative analysis of different types of -omic data are attracting more research interest,” said Dr. Sun. In the new study, the researchers set out to determine the functional role of DNA methylation in gene expression in cancer while also taking gene copy number into account. An added complication in this analysis is the presence of other, non-cancerous cell types in tumor samples removed from patients. In addition to cancer cells that harbor mutations, gene copy number changes, and alterations in DNA methylation patterns, normal cells, lymphocytes, fibroblasts, and endothelial cells among others, that lack these changes are also found in surgical samples. Thus, the presence of normal cells in tumor samples can skew the analysis.
Dr. Sun and colleagues focused their analyses on TCGA data generated from six different cancer types: breast, colon, prostate, glioblastoma, lower-grade glioma, and acute myeloid leukemia. The authors used computational methods to integrate three -omic datasets: somatic copy number variation, gene expression, and DNA methylation. Associations between each of these datasets, while accounting for the third dataset, were conducted. All analyses also controlled for experimental effects (study site of tumor tissue collection and assay batches), demographic variables (age, gender, and population stratification), and cancer subtypes when possible. The main results discussed are specific to breast cancer although the overall findings were largely consistent among the different tumor types studied.
As the authors expected, somatic copy number variation was found to be positively associated with gene expression. More surprisingly was the finding that the association between somatic copy number variation and DNA methylation is more variable and dependent on the location of CpG sites. For example, when methylation and copy number are negatively associated, the CpGs tend to be located in CpG islands, areas of DNA with a high density of CpG sites that tend to be at or near promoters. When methylation and copy number are positively associated, the CpGs tend to be in CpG oceans, DNA regions with low levels of methylation.
In assessing the relationship between DNA methylation and gene expression, the authors discovered a large number of associations. However, many of the associations disappeared if tumor purity was accounted for as estimated by somatic copy number variation. Dr. Sun summarized this major finding, “Our results demonstrate that both gene expression and DNA methylation data are very informative to study the tumor purity and underlying cell type composition.” Thus, it is crucial to control for the confounding effect due to the heterogeneous cell type composition that makes up tumors. The authors then developed a new statistical model that significantly improved the methylation-gene expression associations by estimating and removing the confounding effects of tumor purity and cell type composition (see Figure). The impact of this work was emphasized by Dr. Sun, “Without correcting for such confounding effect, 99.9% of associations between gene expression and DNA methylation are false positives.”
The authors also found that in general, DNA methylation near transcription start sites tends to be negatively associated with gene expression while methylation within the gene tends to be positively associated. Additional analyses of all three -omic data sets revealed that associations between somatic copy number variation and gene expression are likely not mediated by DNA methylation.
When asked about the next steps, Dr. Sun described related research underway, “Our group is working on a new method to estimate tumor infiltrating immune cell composition by combining gene expression and DNA methylation data. Such immune cell composition estimates can be very informative in clinical settings, for example, to predict cancer patients’ response to immunotherapy.” With the finding that tumor microenvironment composed of non-tumor cells also contribute significantly to tumor growth, this is undoubtedly an important next step in understanding underlying mechanistic contributors to cancer progression.
This research was supported by the National Institutes of Health.
Sun W, Bunn P, Jin C, Little P, Zhabotynsky V, Perou CM, Hayes DN, Chen M, Lin D-Y. 2018. The association between copy number aberration, DNA methylation and gene expression in tumor samples. Nucleic Acids Research. doi:10.1093/nar/gky131.