‘Biggest broadest brush’
Establish in 1973, SEER is only one of the country’s national registries. In the early ‘90s, the U.S. government established a sister program, the National Program of Cancer Registries, sponsored by then-freshman Rep. Bernie Sanders of Vermont. The NPCR, operated through the Centers for Disease Control and Prevention, covers about 97% of the country: 45 states, the District of Columbia, Puerto Rico, and the U.S. Pacific Island jurisdictions. This registry is also trying to enhance its state databases.
SEER serves more as a microcosm of the U.S. population, with around 20 carefully selected state and regional registries representing our blended nation. SEER and the NPCR are both are part of a larger umbrella organization, the North America Association of Central Cancer Registries or NAACCR, which promotes uniform standards, provides education and much more, like view cancer stats with their interactive online tools.
Etzioni openly “sings the praises” of cancer registries but also realizes they’re not perfect. Registries like SEER are still extremely valuable, she said, tracking progress against cancer; identifying changes that require attention; identifying new cancer causes; generating ideas to reduce the risk of cancer and illuminating disparities between various groups.
“Cancer registry data are the biggest broad brush we have to understand cancer in the population,” she said. “We’re trying to get a broad unbiased snapshot of the population and it’s incredibly useful in understanding the state of cancer in the nation.”
One recent study using SEER data found breast cancer deaths in women under 40 have stopped declining, which researchers believe is related to the rapidly rising distant-stage breast cancer rates in the same age group.
“SEER is the most well-respected cancer registry program in the world and is the gold standard of cancer registries out there,” Li said. “And the CSS is the gold standard within that. We have an outstanding staff with a wealth of experience in cancer registration and they take their work very seriously.”
Digging into the data using AI
One reason CSS data are so valuable is because of working agreements, or linkages, it struck early on with all of the major pathology providers serving its 13-county catchment area. These linkages allow for “very rapid identification and ascertainment of data” on patients when they’re diagnosed, Li said.
“The linkages provide us with complete data much more rapidly,” he said. “That was our former Director of Information Services Mary Potts’ vision, to take advantage of electronic data. That’s made CSS a model for other registries.”
The quality of CSS data has also made it an attractive target for scientists eager to mine the registry for additional secrets.
In a 2018 study, then-Hutch physician-scientist Dr. Bernardo Goulart and colleagues used natural language processing, or NLP, to delve into the CSS and capture information on two mutations often found in non-small cell lung cancer, or NSCLC. Targeted oral therapies known as tyrosine kinase inhibitors or TKIs have been life-changers for many patients who carry common mutations in the genes ALK and EGFR, and these drugs are currently offered as a first-line treatment for patients diagnosed with stage 4 NSCLC.
But not all patients were being tested — or getting accurate tests — before their treatment. Goulart was able to tease this information out from the CSS. Another of his studies used NLP to scour the CSS for patients who’d been prescribed TKIs for their ALK- and EGFR-driven cancers, pairing it with insurance claims data to determine the financial impact of this treatment on cancer patients.
He and colleagues from HICOR, the Hutchinson Institute for Cancer Outcomes Research, found that the higher the TKI out-of-pocket costs, the more patients cut back or quit taking the medications, with Medicare patients faring worse by a significant margin.
Moving forward, Goulart said AI methods could be used to look for other actionable mutations or interventions.
“These studies could serve as a prototype to look at molecular mutations in other tumors,” he said. “We did it in lung cancer, but there’s no reason to believe you can’t do this in other tumor settings.”
New linkages, new opportunities
Schwartz, who has worked at the Hutch for over 30 years and partnered with Goulart on one of his studies, said the CSS is also an extremely valuable source for disparities research.
“We use it internally within the [Fred Hutch/University of Washington] Cancer Consortium to help identify parts of our catchment area where there are populations that might have a high burden of a particular cancer,” he said.
It can also pinpoint problems with access to care or structural bias. A recent HICOR report found that where a person lived in Washington state often determined if they lived after a cancer diagnosis.
“There’s been a major effort to expand the different ways the SEER data is being used,” Li said. “Historical linkages — like the ones between SEER and Medicare data — have been used in many, many studies. More recently, though, there have been linkages with different commercial pharmacies to try and get information on prescription medications relating to cancer or other disease. And there’s interest in trying to expand the geospatial data — linking information on addresses with neighborhood characteristics, like exposures to pollutants and other environmental exposures. That’s another opportunity.”
Li is currently collaborating with Microsoft to interrogate electronic medical record data to capture information about metastatic recurrence. He and others are also testing the accuracy of these new linkages and data extraction methods.
“We’ve found that the accuracy depends on your source of data,” he said. “With a pathology report you get so far. With radiology, you get more. A lot of recurrences are identified by imaging and there may never be a biopsy or pathology report. It does seem like it would be a substantial improvement to have pathology and radiology reports linked to a registry.”
Li said he and his collaborators are currently writing up preliminary results, but the process might provide a model for filling in all recurrence data gaps.
“We started with breast cancer because we had gold standard data from thousands of patients showing who recurred and who did not,” he said. “This could definitely be used as a model moving forward.”
Etzioni is also using workarounds to collect recurrence data. In a paper published last year, she and colleagues found that data mining of medical claims “holds promise for the streamlining of cancer registry operations” to collect metastatic recurrence data. The project also explored using patient self-report about recurrence histories, which they found to be a highly accurate source of information (published results are forthcoming).
Schwartz, both cancer survivor and scientist, said cancer registries will always be an incredibly valuable resource for patients, clinicians, and researchers, even if they haven’t yet revealed all their secrets.
“You’d be hard pressed to find any kind of regular data collection effort that was trying to maintain consistency with the past and stay relevant to the current situation that was also, with constricted funding, able to collect everything you want,” he said.
As for cancer registries’ potential in future research?
“In some senses, it’s only limited by people’s creativity,” he said.