12,399 views

@bengoldacre

, 58 tweets, 27 min read

My Authors

OUR NEW PAPER.

Factors associated with COVID-19-related hospital death in the linked electronic health records of 17 MILLION adult NHS patients.

Largest study of its kind ever, anywhere.

The power of UK / NHS data, realised.

opensafely.org/outputs/2020/0…

medrxiv.org/cgi/content/sh…

HERE IS THE PRESS RELEASE FOR OUR PAPER.

This is the largest study to date, analysing NHS health data from 17.4 MILLION UK adults between 1 Feb and 25th April 2020.

It gives the strongest evidence to date on risk factors associated with COVID-19 death.

opensafely.org/press-releases…

Among the 17.4 million adults in the sample, there were 5,707 deaths in hospitals attributed to COVID-19.

Key factors related to COVID-19 death included being male, older age, uncontrolled diabetes and severe asthma.

Deprivation was also found to be a major risk factor: this was also only partially attributable to other clinical risk factors.

A very big and important finding is on ethnicity, where we shed important new light on previous findings.

People of Asian and Black ethnic origin were found to be at a higher risk of death. Previously, commentators and researchers have reasonably speculated that this might be due to higher prevalence of medical problems, e.g. cardiovascular disease or diabetes among BME communities..

Our study is the first to examine the full, detailed pseudonymised health records of patients who died, and 17 million adults at risk of death, to account for those other background medical factors.

We found the higher risk among BME people is NOT attributable to other factors.

This is extremely important.

We also found very substantially higher risk of death from COVID-19 for older people, and for people with higher BMI. This is EVEN AFTER you take into account whether they have heart disease and other medical problems.

This is very important indeed.

God I need a sandwich. I will explain in a moment how we did it. This is the hardest, fastest, most amazing project I've ever been a part of, across many organisations. Must eat. Barely stopped for six weeks. Back in one mo.

opensafely.org/press-releases…

medrxiv.org/content/10.110…

I'm back.

Here are the risks of death from COVID-19 for each disease category and demographic, each fully adjusted for the others (i.e. taking the others into account), from our unprecedented cohort study in the full detailed pseudonymised health records of 17 MILLION adults.

Busy day. I will be back online at 15:30 UK time to talk more about how we built an unprecedented, open source, highly secure analytics platform running across 24 million patients' full pseudonymised health records, in wide collaboration between academia and the EHR industry...

For now our paper is here:

medrxiv.org/content/10.110…

Press release here:

opensafely.org/press-releases…

@ehr_lshtm

@ehr_lshtm

sorry back at 16:30 to tell the story of our heroic team @ehr_lshtm @alexjohnwalker @drchrisbates @DarthCTR @Roxytonin @dr_c_morton @helencebm @sebbacon @drchrisbates @_EvansD @inglesp @AnnaTheresia @Ladyroho @TPP_SystmOne @jonnycockburn @drjohnparry @StatsFizz @LiamSmeeth1

@richiecroker

@richiecroker

and @richiecroker @HenryMDrysdale @wjchulme @jessRmorley @ndevito1 @SamPDHarper

@sebbacon

@sebbacon

and not forgetting the heroic @sebbacon

So, our paper out today on risk factors for death from COVID-19. The largest analysis ever done, on 17.4 MILLION adults' full pseudonymised records. UK data in action! How did we do it? Innovation, collaboration, and listening...

opensafely.org/press-releases…

medrxiv.org/cgi/content/sh…

In just 6 weeks we built opensafely.org, an entirely new secure analytics platform for electronic health records analytics in the NHS, created to deliver urgent results during the COVID-19 emergency.

Our OpenSAFELY.org uses a new model for enhanced security and timely access to data: we *don't* transport large volumes of potentially disclosive pseudonymised patient data off-site....

... instead, we take the analytics to the records. Trusted analysts can run large scale computation across live pseudonymised patient records inside the data centre of the electronic health records software company, the very place where those records already reside for usual care

This is huge. All data that carries any privacy risk (even a theoretical risk, and even when pseudonymised) remains within the secure data centre of the electronic health record vendor, where it already resides. This also means that all activity is logged for independent review.

All processing takes place in the same secure data centre, where the patients’ electronic records were already stored. The only information to ever leave the data centre is summary tables (with low numbers suppressed) from statistical models.

Within the data centre, all pseudonymised data is stored in a tiered system of increasingly less disclosive data stores tailored to each analysis. The pseudonymised "event-level" store carries the highest *theoretical* re-identification risk. This is rapidly transformed...

.. into a data store that is one row per patient, with mostly binary or categorical variables. So very rich detailed patient data is transformed, with incredibly precise disease or exposure (e.g. prescription) definitions into much less disclosive variables.

These are bespoke to each study. They are defined in the analytic environment. I will share more about how that works later, suffice to say it is magnificent. Every action on the data is logged. Of course, access to the database is locked right down, MAC addresses, VPN, etc...

Only a tiny number of us have highly restricted access (in addition to the TPP staff who already work in the patient records data centre) and we are working on behalf of NHS England in the context of a global emergency.

Working inside the data centre where the records are already held carries other huge benefits. Huge data extracts are difficult, expensive, and typically intermittent. In a global health emergency, you need to be running analytics across CURRENT data. This is easy with our model

But to be clear, our platform is not wedded to the data residing inside the TPP patient records database. We have written fully portable code, portable by design, to run against any data store the NHS has today, or that the NHS might create in the future.

Now, let me tell you a little about our code, our software. Everything is open source. You can read absolutely everything that creates the cohort, the codelists, runs the analysis, sanity checks the data, and more, all on GitHub, all beautiful: opensafely.org/code/

Here is the full code repository for our paper that came out today on GitHub. As you can see, this is not your everyday old fashioned epidemiology project. You will not be importing an excel file and pressing buttons. github.com/ebmdatalab/ope…

This is massively collaborative computational data science using open tools. This is not just collaboration in the sense of "having some meetings" (tho we love to hang out). This is collaboration through pull requests and merges back to the master.

This is a technical platform with open collaboration built in by design.

And there's more. If you look a little more closely, you will also see that, as we go, we have been making generalisable tools, to share with the whole community.

For example: where do we store our codelists? (Oh you will love this, epidemiology researchers!). Codelists are part of beautiful data scalpels for precisely identifying a group of patients who match a disease definition, a drug exposure, or BMI, or BP, or any number of things...

(I say "a part of" because it's not just a codelist, it'll be "these codes, between these dates" or "these ones but not those ones, between those dates" and so on).

@inglesp

@inglesp

We don't just copy and paste those into the analysis scripts. No... We have built a beautiful generalised open framework for working with codelists for electronic health records research, led by our mighty, glorious @inglesp. Early days, but here it is: github.com/ebmdatalab/ope…

So codelists and algorithms are called from here. Want to use it yourself, for your own analyses, against EHR data you already hold? Go ahead. Share feedback, your information requirements, we have a great roadmap and we're all ears for new requests.

You can see all the code on the github link above. You can see some but not all the functionality in this live service, the very one we call, here:

codelists.opensafely.org

Go on, pick one at random. You can see our GitHub discussions on the contents, and links to papers (often by our team) on the papers that created the codelists, or validated them. For example here... codelists.opensafely.org/codelist/opens…

And the team. Oh the team. Here, I might get weepy. This is without question the greatest single project I have ever worked on in my life. A phenomenal combination of talents. Depth. Breadth. And most of us have never met in meatspace, only in video chat. So who are we???

@EBMDataLab

@EBMDataLab

In short, the broad, perfect collaboration. Firstly, my group @EBMDataLab in Oxford, a mixed team of software developers, traditional academics, and clinicians, all pooling skills for years to build tools like openprescribing.net as well as pure academic papers.

@alexjohnwalker

@alexjohnwalker

That gives you @alexjohnwalker @dr_c_morton @helencebm @sebbacon @_EvansD @inglesp @richiecroker @HenryMDrysdale @wjchulme @jessRmorley @ndevito1 and more.

@ehr_lshtm

@ehr_lshtm

Next up, the Electronic Health Records research group at LSHTM (home of epidemiology, Bradford Hill, and the rest). @ehr_lshtm are amazeballs, led by @LiamSmeeth1 (now Dean of Epi, fgs). This group has *phenomenally* deep knowledge around the strengths and weaknesses of GP data.

@DarthCTR

@DarthCTR

That gets you @DarthCTR @Roxytonin @ladyroho @AnnaTheresia @StatsFizz and many more, the mighty Ian Douglas, the amazing Stephen Evans (check out his book on research fraud), Krishnan Bhaskaran and more.

@TPP_SystmOne

@TPP_SystmOne

Next up, remember, this was a full, deep, wide collaboration, industry and academia: next up is @TPP_SystmOne, the company that make SystmOne GP software, used in 40% of NHS GP practices. They bring @drchrisbates @jonnycockburn @drjohnparry @SamPDHarper

I have never seen such a phenomenal meeting of minds. It feels like we have bridged some vital historic divides, in particular between EHR delivery and EHR research. Our teams speak the same language. This wasn't by magic. In some cases it was hard. This is a NEW WAY OF WORKING..

So epidemiologists who are decades-long ninjas with Stata and R had to dive in and learn about Python, Docker, SQL, GitHub. Reader, they rose to this with *panache* and great smarts. We are rapidly expanding our "developer-epidemiologist" crossover group. It makes me weep w/ joy

And of course, we are all doing this on behalf of NHS England, and NHSX, in the context of a global health emergency. I'm not going to give a political broadcast here, but there have been a lot of people hitting on govt, NHS, and so on. I see smart human people working hard.

NHS E, X, and all, have been amazing to work with, a phenomenally positive experience, and they have moved mountains. Like them, we are here to serve. If you have analytic queries, we've a long list of priorities from various key national stakeholders (as they say) but open ears.

I will end this mega-ramble soon, but let me just add one thing, about the scale of data. It is important.

People often talk about the "power of NHS data", in quite hypothetical terms. In this project, we have proven it, by delivering a platform at scale. The questions we are answering are of global importance for clinicians, policymakers and patients around the world.

These answers can only be delivered by analysing large datasets, handled securely, on the scale that we have assembled. The UK, with the NHS, is, I would argue, the only country on the planet with the scale of data needed to deliver these analyses.

We have built this project because we believe the UK has a responsibility to the global community to make good use of this data, securely, and to the highest scientific standards.

To do it securely, we had to step outside the norm. We had to develop an entirely new model. We had to do security to the highest level. We had to take the analytics to the place where the data already resides.

Because you cannot answer the questions we have answered today, about who is most at risk of death from COVID, about how risks interact, about the possible effects of drugs, or any of the other analyses in our roadmap, with smaller scale data.

opensafely.org

We have also, btw, delivered this unprecedented platform with NO specific funding for the project to date, despite *all* our best efforts.

So, there it is. We took a chance, we bet the farm, we have risked wasting our time... and we have delivered. We have a huge pipeline, and today's paper is just the beginning. The team are heroic.

@alexjohnwalker

@alexjohnwalker

One more tiny thing. This is just the beginning. We are expanding, and I hope we will have some extraordinary positive news to share next week. All I can say for now is that COVID, among all the tragedy, has brought out the absolute very best in some people, and organisations..

Our first paper
medrxiv.org/cgi/content/sh…

Our press release
opensafely.org/press-releases…

Our code
opensafely.org/code/

Our platform
opensafely.org

Enjoying this thread?

Try unrolling a thread yourself!

Enjoying this thread?

Try unrolling a thread yourself!

More from @bengoldacre see all

Embed code for your website

Did Thread Reader help you today?