Nicholas Zaorsky, MD MS Profile picture
Feb 28, 2022 49 tweets 34 min read Read on X
Health services research using United States cancer databases

Here is everything you want to know about @theNCI SEER, @AmericanCancer @AmCollSurgeons NCDB, and newer claims databases for clinical research in oncology

🧵 ImageImage
First, many thanks to these great people for helping me with the material Image
Retrospective databases are ideal for certain types of questions related to epidemiology, staging, rare diseases, quality, prognostication, prediction, and some "real world evidence / data" Image
However, we should be cautious in using these databases for (1) comparative effectiveness research, and (2) comparing outcomes of patients today vs a prior era Image
(1) These databases are not meant for comparative effectiveness research, ie evaluating tx A vs B.

If you're considering it, send your data to me and @wedney2017 and we will show you how you can get any answer you want: A>B, B>A, A=B😅

(2) These databases are not meant to compare outcomes (via KM plots) over major eras.

The KM plots are often affected by the Will Rogers phenomenon

ImageImage
Here are the trends in publications using these databases.

SEER and NCDB make up the majority of oncology health services research. Image
Data from @theNCI SEER contains information on ~1/3 of cancer cases in the US since 1973. Data come from minority-enriched geographic areas.

You can get data here:
seer.cancer.gov/data/access.ht…

Do the tutorials here:
seer.cancer.gov/seerstat/tutor… ImageImageImage
SEER has awesome data, includes US census info (so proportions, risks can be calculated), and it continues to evolve

Two great papers about strengths and weaknesses of SEER from @HenryParkMD @jamesbyu
pubmed.ncbi.nlm.nih.gov/22481006/
pubmed.ncbi.nlm.nih.gov/22481009/ ImageImage
1, you could calculate incidence and mortality data on specific cancers since the 1970s

2, you can evaluate risk of death from a particular cause of death, eg stroke

@NatureComms

3, you can evaluate epidemiology of a particular disease state, eg metastasis

On the other hand, SEER has limitations.

For example, there is no data on the #1 diagnosed cancer in the US, basal/squamous cell skin ca. Most of these cancers are extirpated, frozen, desiccated by PCPs, dermatologists. We can't get a reliable numerator/denominator on cases. Image
Some have questioned coding reliability, and there have been years where coding changes impacted the database, though these were corrected.

RT has been taken out of core variable pack bc it is difficult to find after the pt had surg, if they went on to get RT closer to home. Image
Generally, SEER has high quality data. It undergoes QA and audits by qualified professionals, adhering to 2 basic principles:
auditing high quantity data (eg, breast ca)
auditing high risk data (eg, new staging system)

seer.cancer.gov/qi/tools/casef… Image
You can put in a separate data request to access the treatment variables, eg radiation, chemo.

SEER has you sign a separate form stating you understand these limitations of the variables: >85% of cases have correct treatment info.

seer.cancer.gov/data-software/… Image
SEER also excels because it provides ICD cause of death, which is not present in NCDB (or any? claims database). However, coding cause of death is difficult.

For James Bond, you only live twice.
For SEER, you only die once (i.e., there is only 1 cause).
seer.cancer.gov/codrecode/1969… Image
Cause of death comes from death certificates, from the physician caring for the patient at time of death.

Here is a blurb from @StoltzfusKelsey paper:
ncbi.nlm.nih.gov/pmc/articles/P…
ncbi.nlm.nih.gov/pmc/articles/P…
ncbi.nlm.nih.gov/pmc/articles/P… Image
When you access SEER, there are different "sessions" you can use.

"Case listing" is the session that most people would be familiar with. ImageImage
To run the session:
1. file, new
2. Data: SEER registry you want (some have diff yrs, variables)
3. Selection: select specific cancer pts
4. Data: select variables you want for columns. more better than less here
5. Lightning bolt executes ImageImageImageImage
Here are data.

Ctrl-C: copy cells, then paste into program, eg Excel.
Ctrl-R: copy session info (EC/IC). Paste this in another tab in Excel.

Most journals want to know EC/IC so others can replicate your work.

Save the .SL + .SLM files too, in case you want to reopen in SEER. ImageImageImage
You can do the same with SIR session to get observed/expected events, 95% CIs, person years at risk, mean age of event

SIR is similar to relative risk. The denominator (expected) comes from the general US population (cancer + non-cancer pts). Image
Here is how to get SIR data for specific cause of death

1, New, MP-SIR
2, Database selection
3, Rates, Selection you can probably leave as is
4, Parameters: select follow up time latencies
5, Events: what COD do you want? ImageImageImageImage
6, Statistic: leave alone
7, Table: what do you want table to look like?
8, Lightning bolt
9, Getting data...
10, Completed analysis

If your worksheet comes up with all 0s, it's bc you didn't select COD in the dropdown on the last screen. ImageImageImageImage
Questions you can and cannot answer with @theNCI SEER ImageImage
SEER can be linked to different databases.
SEER Medicare is a popular option.
Here they are juxtaposed. ImageImageImage
Thank you to the @ACS_Research @AmericanCancer @AmColSurgCancer for providing this amazing resource. Image
NCDB is like a collection of case listing files that you would have seen in SEER. Each file is specific to a disease site. You apply for select sites and they are sent to you. For larger questions, you can merge files PRN. Image
NCDB is focused on treatment quality.
NCDB has much more information than SEER about treatment, including surgery, systemic therapy, radiation therapy. ImageImageImageImage
NCDB states the data are hospital-based, not population-based. The SEER processes to ensure representation of minorities are not necessarily in place.
Data come from CoC-accredited facilities (~70% of US centers). Other caveats re data similar to SEER exist. Image
One concern w NCDB is that many patients have missing data, and patients with missing data may have worse outcomes.
@JAMANetworkOpen
jamanetwork.com/journals/jaman… Image
Questions you can and cannot answer with NCDB:

One of my favorite projects: ImageImage
For reference, here is what the data in NCDB looks like.

One variable you will notice immediately is the facility ID, ie, the place where the pt was treated. It's not possible and not allowed to decode for specific facility name. ImageImageImage
Part III of this tweetorial: comparing SEER vs NCDB

SEER has greater focus on epidemiology, incidence, mortality, cause of death.
NCDB has greater focus on surveillance, treatment, quality. ImageImage
The data dictionaries for SEER and NCDB are available online:

seer.cancer.gov/analysis/
facs.org/-/media/files/… Image
SEER and NCDB have several variables in common.

These common variables inspired our STARS staging system for metastatic cancer.
@IntJCanc @uicc @AJCCancer @NCCN

We developed the system in one database and validated it in the other.

Image
SEER and NCDB also have site specific factors, which provide more detailed information about a particular cancer.

seer.cancer.gov/seerstat/datab…
naaccr.org/SSDI/SSDI-Manu… ImageImageImage
The availability of SSFs allows for validation of new staging systems, eg, @AJCCancer 8th vs 7th ed for oropharyngeal cancer, integrating HPV status.
Work from @TedTeknosMD
pubmed.ncbi.nlm.nih.gov/28939068/
#HNCSM

It would be great if SEER and NCDB could next integrate these variables, many of which are already commonly collected at time of consultation with an oncologist. Image
Part IV: Claims databases for health services research ImageImage
One of the most popular new claims databases is MarketScan, which includes ICD, CPT, HCPCS codes. ImageImageImageImage
MarketScan covers >80M patients, is not specific to oncology, and includes private insurance (ie, pts < 65 yo). ImageImageImageImage
MarketScan + SEER allow us to estimate the cost of cancer care in the United States

ja.ma/3leArMs Via
@JAMANetworkOpen @JAMA_current

Here are some other projects you can and cannot do with MarketScan.

One of my favs: classification of common human diseases derived from shared genetic and environmental determinants
@NatureGenet
nature.com/articles/ng.39… ImageImageImage
Similarly, TriNetX is a claims database that can be used in oncology.

Thanks to @AVnishKatoch @PennStateCTSI @PennStHershey for the information. ImageImageImageImage
Here is a comparison of NCDB, SEER, SEER Medicare, MarketScan, and TriNetX.

Table adapted from Dan Boffa, @mafacktor work @JAMAOnc:
pubmed.ncbi.nlm.nih.gov/28241198/ Image
Data collection in SEER, NCDB, hospital databases has "classical" formatting. There is basically just 1 time point (at diagnosis) with covariates. There is a variable that provides time until last follow up and vital status. A ton of data are missing. Image
Claims databases may provide many more time points with data. Soon, we may also be able to integrate text, images, etc.

These databases are ideal for analysis with AI/ML. Image

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Nicholas Zaorsky, MD MS

Nicholas Zaorsky, MD MS Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @NicholasZaorsky

Mar 20, 2023
Academic medicine, summarized in a few publications

Thread 🧵

#AcademicTwitter #MedTwitter
Understanding academic medical centers: Simone's maxims

pubmed.ncbi.nlm.nih.gov/10499593/
@AACR
Final page and references
Read 8 tweets
Mar 15, 2023
Talk from @DigiaimoRon on billing and coding in #radonc at #ACRO2023.
Cumulative payments in the field are down 30% from 2007 to 2023. Image
#radonc CPT codes are under 77xxx, historically under radiology.

Coding rules from CMS change.
Eg, CT sim and 3D plan historically could not be billed on same day. Image
Modifiers are added to CPT codes.
Commonly misunderstood part of billing: consultation and CT sim can be done on same day, but need to use a modifier.

Remember to also list ICD 10, list laterality.
"CPTs asks if you want to get paid. ICD 10 tells if you will get paid." ImageImageImage
Read 4 tweets
Dec 3, 2022
Radiotherapy for renal cell carcinoma: current status and future directions

#RadOnc #kcsm #KidneyCancer @ASTRO_org @ESTRO_RT @ARRO_org @RadoncUh

Thread🧵
For reference, kidney cancer staging is here.

Currently, role of #radonc is for smaller cancers (eg, T1a/b, some T2) and metastatic disease.

Read 28 tweets
Oct 23, 2022
How to write a research abstract for presentation at a meeting

Presented at #ASTRO22 @ASTRO_org
@pipcosper #radonc

Tweetorial 🧵
This thread will review the key components of each abstract section and provide examples of some of the highest scored abstracts at #ASTRO22

Since our Twitter audience is diverse, I will also highlight key features in recent @NEJM NordICC abstract:
nejm.org/doi/full/10.10…
Abstracts are usually structured into 4 parts
Read 18 tweets
May 6, 2022
Oral boards for #RadOnc are approaching. Here is advice to anyone taking the exam.

@ARRO_org @ASTRO_org @ACRORadOnc @ACROresident
1, #RadOnc oral boards are the most clinically relevant exams (vs rad bio, physics, written exam, inservice, etc).
Many of the questions about management come straight from @NCCN guidelines, so use these as a primary reference.
2, have a prepared script of what to say for standard questions. eg, workup, setup, margins, doses

Here is an example for prostate ca history / workup
#pcsm
Read 23 tweets
Apr 8, 2022
How to run a meeting at an academic medical center

🧵
Originally, this presentation was for our oncology trainees, and we figured we would share it on #AcademicTwitter #MedTwitter to maximize the impact of your meetings.

Thanks to @DrSpratticus @LeilaTchelebi @EricLehrer @TimShowalter1 @RonaldChenMD @nytimes @HarvardBiz et al
1. Do you really need a meeting?

Consider an email if:
you're just sharing info
there is no discussion or decision
you've already had a similar meeting
Read 14 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(