Health services research using United States cancer databases

Here is everything you want to know about @theNCI SEER, @AmericanCancer @AmCollSurgeons NCDB, and newer claims databases for clinical research in oncology

🧵 ImageImage
First, many thanks to these great people for helping me with the material Image
Retrospective databases are ideal for certain types of questions related to epidemiology, staging, rare diseases, quality, prognostication, prediction, and some "real world evidence / data" Image
However, we should be cautious in using these databases for (1) comparative effectiveness research, and (2) comparing outcomes of patients today vs a prior era Image
(1) These databases are not meant for comparative effectiveness research, ie evaluating tx A vs B.

If you're considering it, send your data to me and @wedney2017 and we will show you how you can get any answer you want: A>B, B>A, A=B😅

(2) These databases are not meant to compare outcomes (via KM plots) over major eras.

The KM plots are often affected by the Will Rogers phenomenon

ImageImage
Here are the trends in publications using these databases.

SEER and NCDB make up the majority of oncology health services research. Image
Data from @theNCI SEER contains information on ~1/3 of cancer cases in the US since 1973. Data come from minority-enriched geographic areas.

You can get data here:
seer.cancer.gov/data/access.ht…

Do the tutorials here:
seer.cancer.gov/seerstat/tutor… ImageImageImage
SEER has awesome data, includes US census info (so proportions, risks can be calculated), and it continues to evolve

Two great papers about strengths and weaknesses of SEER from @HenryParkMD @jamesbyu
pubmed.ncbi.nlm.nih.gov/22481006/
pubmed.ncbi.nlm.nih.gov/22481009/ ImageImage
1, you could calculate incidence and mortality data on specific cancers since the 1970s

2, you can evaluate risk of death from a particular cause of death, eg stroke

@NatureComms

3, you can evaluate epidemiology of a particular disease state, eg metastasis

On the other hand, SEER has limitations.

For example, there is no data on the #1 diagnosed cancer in the US, basal/squamous cell skin ca. Most of these cancers are extirpated, frozen, desiccated by PCPs, dermatologists. We can't get a reliable numerator/denominator on cases. Image
Some have questioned coding reliability, and there have been years where coding changes impacted the database, though these were corrected.

RT has been taken out of core variable pack bc it is difficult to find after the pt had surg, if they went on to get RT closer to home. Image
Generally, SEER has high quality data. It undergoes QA and audits by qualified professionals, adhering to 2 basic principles:
auditing high quantity data (eg, breast ca)
auditing high risk data (eg, new staging system)

seer.cancer.gov/qi/tools/casef… Image
You can put in a separate data request to access the treatment variables, eg radiation, chemo.

SEER has you sign a separate form stating you understand these limitations of the variables: >85% of cases have correct treatment info.

seer.cancer.gov/data-software/… Image
SEER also excels because it provides ICD cause of death, which is not present in NCDB (or any? claims database). However, coding cause of death is difficult.

For James Bond, you only live twice.
For SEER, you only die once (i.e., there is only 1 cause).
seer.cancer.gov/codrecode/1969… Image
Cause of death comes from death certificates, from the physician caring for the patient at time of death.

Here is a blurb from @StoltzfusKelsey paper:
ncbi.nlm.nih.gov/pmc/articles/P…
ncbi.nlm.nih.gov/pmc/articles/P…
ncbi.nlm.nih.gov/pmc/articles/P… Image
When you access SEER, there are different "sessions" you can use.

"Case listing" is the session that most people would be familiar with. ImageImage
To run the session:
1. file, new
2. Data: SEER registry you want (some have diff yrs, variables)
3. Selection: select specific cancer pts
4. Data: select variables you want for columns. more better than less here
5. Lightning bolt executes ImageImageImageImage
Here are data.

Ctrl-C: copy cells, then paste into program, eg Excel.
Ctrl-R: copy session info (EC/IC). Paste this in another tab in Excel.

Most journals want to know EC/IC so others can replicate your work.

Save the .SL + .SLM files too, in case you want to reopen in SEER. ImageImageImage
You can do the same with SIR session to get observed/expected events, 95% CIs, person years at risk, mean age of event

SIR is similar to relative risk. The denominator (expected) comes from the general US population (cancer + non-cancer pts). Image
Here is how to get SIR data for specific cause of death

1, New, MP-SIR
2, Database selection
3, Rates, Selection you can probably leave as is
4, Parameters: select follow up time latencies
5, Events: what COD do you want? ImageImageImageImage
6, Statistic: leave alone
7, Table: what do you want table to look like?
8, Lightning bolt
9, Getting data...
10, Completed analysis

If your worksheet comes up with all 0s, it's bc you didn't select COD in the dropdown on the last screen. ImageImageImageImage
Questions you can and cannot answer with @theNCI SEER ImageImage
SEER can be linked to different databases.
SEER Medicare is a popular option.
Here they are juxtaposed. ImageImageImage
Thank you to the @ACS_Research @AmericanCancer @AmColSurgCancer for providing this amazing resource. Image
NCDB is like a collection of case listing files that you would have seen in SEER. Each file is specific to a disease site. You apply for select sites and they are sent to you. For larger questions, you can merge files PRN. Image
NCDB is focused on treatment quality.
NCDB has much more information than SEER about treatment, including surgery, systemic therapy, radiation therapy. ImageImageImageImage
NCDB states the data are hospital-based, not population-based. The SEER processes to ensure representation of minorities are not necessarily in place.
Data come from CoC-accredited facilities (~70% of US centers). Other caveats re data similar to SEER exist. Image
One concern w NCDB is that many patients have missing data, and patients with missing data may have worse outcomes.
@JAMANetworkOpen
jamanetwork.com/journals/jaman… Image
Questions you can and cannot answer with NCDB:

One of my favorite projects: ImageImage
For reference, here is what the data in NCDB looks like.

One variable you will notice immediately is the facility ID, ie, the place where the pt was treated. It's not possible and not allowed to decode for specific facility name. ImageImageImage
Part III of this tweetorial: comparing SEER vs NCDB

SEER has greater focus on epidemiology, incidence, mortality, cause of death.
NCDB has greater focus on surveillance, treatment, quality. ImageImage
The data dictionaries for SEER and NCDB are available online:

seer.cancer.gov/analysis/
facs.org/-/media/files/… Image
SEER and NCDB have several variables in common.

These common variables inspired our STARS staging system for metastatic cancer.
@IntJCanc @uicc @AJCCancer @NCCN

We developed the system in one database and validated it in the other.

Image
SEER and NCDB also have site specific factors, which provide more detailed information about a particular cancer.

seer.cancer.gov/seerstat/datab…
naaccr.org/SSDI/SSDI-Manu… ImageImageImage
The availability of SSFs allows for validation of new staging systems, eg, @AJCCancer 8th vs 7th ed for oropharyngeal cancer, integrating HPV status.
Work from @TedTeknosMD
pubmed.ncbi.nlm.nih.gov/28939068/
#HNCSM

It would be great if SEER and NCDB could next integrate these variables, many of which are already commonly collected at time of consultation with an oncologist. Image
Part IV: Claims databases for health services research ImageImage
One of the most popular new claims databases is MarketScan, which includes ICD, CPT, HCPCS codes. ImageImageImageImage
MarketScan covers >80M patients, is not specific to oncology, and includes private insurance (ie, pts < 65 yo). ImageImageImageImage
MarketScan + SEER allow us to estimate the cost of cancer care in the United States

ja.ma/3leArMs Via
@JAMANetworkOpen @JAMA_current

Here are some other projects you can and cannot do with MarketScan.

One of my favs: classification of common human diseases derived from shared genetic and environmental determinants
@NatureGenet
nature.com/articles/ng.39… ImageImageImage
Similarly, TriNetX is a claims database that can be used in oncology.

Thanks to @AVnishKatoch @PennStateCTSI @PennStHershey for the information. ImageImageImageImage
Here is a comparison of NCDB, SEER, SEER Medicare, MarketScan, and TriNetX.

Table adapted from Dan Boffa, @mafacktor work @JAMAOnc:
pubmed.ncbi.nlm.nih.gov/28241198/ Image
Data collection in SEER, NCDB, hospital databases has "classical" formatting. There is basically just 1 time point (at diagnosis) with covariates. There is a variable that provides time until last follow up and vital status. A ton of data are missing. Image
Claims databases may provide many more time points with data. Soon, we may also be able to integrate text, images, etc.

These databases are ideal for analysis with AI/ML. Image

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Nicholas Zaorsky, MD MS

Nicholas Zaorsky, MD MS Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @NicholasZaorsky

Jan 9
Advice for mentees (eg, med students, residents, junior faculty) who want to publish a manuscript
1. Research your mentor.

Know their body of work, content expertise
Do they routinely work w trainees?
Ask mentees who have worked w this mentor what the experience is like
For example, I see work of @DrSpratticus and the people who are thankful to receive his guidance.

@RadoncUh @UHhospitals trainees and faculty are lucky to have @DrSpratticus.

@annalaucis @DanWahlMD @UMichRadOnc @BaydounMDPhD

Read 25 tweets
Jan 7
Guide to writing a medical research manuscript

bit.ly/ZaorskyManuscr…

Here is advice after writing, reading, and reviewing 1000+ manuscripts.

#MedTwitter Image
Why do we do medical research?

There are at least two problems with medicine:
(1) in 100 years, half of it will be proven to be false;
(2) we don’t know which half.
Success in our research career depends on these 3 pillars.

Thanks to @freddyeescorcia for the slide. Image
Read 108 tweets
Aug 27, 2021
Facility volume has been explored as a surrogate of quality of care in medicine.

pubmed.ncbi.nlm.nih.gov/12230353/
@AnnalsofIM
In oncology, facility surgical volume is correlated with survival.

Work from @StoltzfusKelsey @LeilaTchelebi @DanTrifMD @NirajGusani in @JNCCN

Read 17 tweets
Aug 9, 2021
Salvage therapy for prostate cancer after prostatectomy: international consensus on evaluation and management

@NatRevUrol

rdcu.be/csM2z
pubmed.ncbi.nlm.nih.gov/34363040/
#PCSM ImageImage
Since the 2000s, the use of radical prostatectomy has been increasing for prostate cancer (vs external beam and brachytherapy).

@EUplatinum
pubmed.ncbi.nlm.nih.gov/27597241/ Image
The increase in prostatectomy includes all risk groups, particularly those with high-risk features

Read 30 tweets
Feb 11, 2020
Publication productivity and academic rank in medicine.

Via @AcadMedJournal
ncbi.nlm.nih.gov/pubmed/32028299
From @EricLehrer @DrEmmaHolliday @PennStHershey @penn_state

[tweetorial] on the impact of h-index and m-index on promotion/tenure.

#academicmedicine #MedEd #scholarship Image
@AcadMedJournal @EricLehrer @DrEmmaHolliday @PennStHershey @penn_state Are metrics for promotion and tenure at academic institutions easy to understand?
@AcadMedJournal @EricLehrer @DrEmmaHolliday @PennStHershey @penn_state Most would say the requirements are nebulous.

e.g., here are requirements from top tier institution, for non-tenure track and tenure-track faculty.

Historically, some said:
Asst prof = regional reputation
Assoc prof = national
Full prof = international ImageImage
Read 26 tweets
Dec 29, 2019
In this thread, I will compile my medical illustrations in oncology and #RadOnc.

Many are from textbook w @DanTrifMD
amazon.com/Absolute-Clini…
based on @ARRO_org study guide for board exams.

A picture is worth 1000 words.
Here is oncology in a few pics.
@DanTrifMD @ARRO_org @SpringerNature Starting with pediatrics:
Rhabdomyosarcoma treatment paradigm for cancers of head/neck depends on parameningeal vs non-parameningeal location. PM is an unfavorable site, affects stage. #sarcoma #HNCSM
@DanTrifMD @ARRO_org @SpringerNature CNS/brain anatomy from sagittal view.
#BTSM
Chapter from @cgr0105, Sameer Nath, from University of Colorado
Read 49 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(