Can you detect COVID-19 using Machine Learning? ๐Ÿค”

You have an X-ray or CT scan and the task is to detect if the patient has COVID-19 or not. Sounds doable, right?

None of the 415 ML papers published on the subject in 2020 was usable. Not a single one!

Let's see why ๐Ÿ‘‡
Researchers from Cambridge took all papers on the topic published from January to October 2020.

โ–ช๏ธ 2212 papers
โ–ช๏ธ 415 after initial screening
โ–ช๏ธ 62 chosen for detailed analysis
โ–ช๏ธ 0 with potential for clinical use

healthcare-in-europe.com/en/news/machinโ€ฆ

There are important lessons here ๐Ÿ‘‡
Small datasets ๐Ÿ

Getting medical data is hard, because of privacy concerns, and at the beginning of the pandemic, there was just not much data in general.

Many papers were using very small datasets often collected from a single hospital - not enough for real evaluation.

๐Ÿ‘‡
Biased datasets ๐Ÿง’๐Ÿง‘โ€๐Ÿฆฒ

Some papers used a dataset that contained non-COVID images from children and COVID images from adults. These methods probably learned to distinguish children from adults... ๐Ÿคทโ€โ™‚๏ธ

๐Ÿ‘‡
Training and testing on the same data โŒ

OK, you just never do that! Never!

๐Ÿ‘‡
Unbalanced datasets โš–๏ธ

There are much more non-COVID scans than real COVID cases, but not all papers managed to adequately balance their dataset to account for that.

Check out this thread for more details on how to deal with imbalanced data:


๐Ÿ‘‡
Unclear evaluation methodology โ‰๏ธ

Many papers failed to disclose the amount of data they were tested or important aspects of how their models work leading to poor reproducibility and biased results.

๐Ÿ‘‡
The problem is in the data ๐Ÿ’ฝ

The big problem for most methods was the availability of high-quality data and a deep understanding of the problem - many papers didn't even consult with radiologists.

๐Ÿ‘‡
A high-quality and diverse dataset is more important than your fancy model!
References ๐Ÿ—’๏ธ

Full article in Nature: nature.com/articles/s4225โ€ฆ
More detailed coverage: statnews.com/2021/06/02/macโ€ฆ
Source for X-Ray image: bmj.com/content/370/bmโ€ฆ
I agree with all your points! I'm sure the data basis is much better now and I've seen much larger datasets.

I think the important point is that rushing to publish results based on small and bad quality datasets undermines the credibility of ML...

There are already much better datasets and I'm sure this problem will get solved at some point. There is also a promising Kaggle challenge:

kaggle.com/c/siim-covid19โ€ฆ

The point is that it needs to be done right and the Nature paper gives some guidance how.

Yes, this is a common problem with such datasets/competitions. At some point people start figuring out how to fine tune on the test set...

One should keep adding data to the test set!

There is certainly pressure to publish in academia

I think many of the papers really wanted to help with getting results out quickly. However, there are some minimum standards that need to be kept. Such results undermine the credibility of ML...

They analyzed papers using both CXR and CT. I would argue that you first need to fix the dataset and then optimize your model. You can usually get much better results addressing things in this order.

An interesting point for the people asking what is the benefit of detecting COVID-19 in X-ray scans at all.

โ€ข โ€ข โ€ข

Missing some Tweet in this thread? You can try to force a refresh
ใ€€

Keep Current with Vladimir Haltakov

Vladimir Haltakov Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @haltakov

4 Jun
A short lesson on object tracking ๐Ÿง‘๐Ÿปโ€๐Ÿซ

Look at this video from a @Tesla Model 3 driving on the highway. The display shows multiple traffic lights coming out of the truck in front towards the car. What's going on? ๐Ÿค”

This is a typical case of a ๐˜๐—ฟ๐—ฎ๐—ฐ๐—ธ ๐—น๐—ผ๐˜€๐˜€!

Thread ๐Ÿ‘‡
The problem ๐Ÿค”

The truck in front carries 3 real traffic lights. The problem is that the computer vision system on the Tesla assumes that traffic lights are static (which is a good assumption in general ๐Ÿ˜„). In this case, though, the traffic lights are moving at 120 km/h...

๐Ÿ‘‡
Object detection ๐Ÿšฆ

A typical object detection system takes a single camera frame and detects all kinds of objects in it.

One of the best models for object detection is YOLO. I just ran this image through it and sure enough, it detects 2 of the traffic lights!

๐Ÿ‘‡
Read 10 tweets
18 May
This is Karma. Karma is not a machine learning classifier ๐Ÿ•โ€๐Ÿฆบ

Karma is a real dog trained to detect drugs. However, he would fail the simplest tests we apply in ML...

Let me take you through this story from the eyes of an ML engineer.

reason.com/2021/05/13/theโ€ฆ

Thread ๐Ÿงต
Story TLDR ๐Ÿ”–

The story is about police dogs trained to sniff drugs. The problem is that the dogs often signal drugs even if there are none. Then innocent people land in jail for days.

The cops even joke about the โ€œprobable cause on four legsโ€.

Let's see why is that ๐Ÿ‘‡
1. Sampling Bias ๐Ÿค

Drugs were found in 64% of the cars Karma identified, which was praised by the police as very good. In the end, most people don't carry drugs in their cars, so 64% seems solid.

There was a sampling problem though... ๐Ÿ‘‡
Read 9 tweets
22 Apr
What is a self-driving car engineer? ๐Ÿง‘โ€๐Ÿ’ป ๐Ÿง  ๐Ÿš™

It's not a single job description - there are many roles in a self-driving project!

๐Ÿง  Machine Learning
๐Ÿ‘€ Computer Vision
๐Ÿ’ฝ Big Data
๐Ÿ•น๏ธ Simulation
โœ… Test and Validation
๐Ÿฆบ Safety
๐Ÿ’ป Software Development

Read more below ๐Ÿ‘‡
๐Ÿง  Machine Learning Engineer
๐Ÿ‘€ Computer Vision Engineer
Read 9 tweets
21 Apr
Computer vision for self-driving cars ๐Ÿง  ๐Ÿš™

There are different computer vision problems you need to solve in a self-driving car.

โ–ช๏ธ Object detection
โ–ช๏ธ Lane detection
โ–ช๏ธ Drivable space detection
โ–ช๏ธ Semantic segmentation
โ–ช๏ธ Depth estimation
โ–ช๏ธ Visual odometry

Details ๐Ÿ‘‡
Object Detection ๐Ÿš—๐Ÿšถโ€โ™‚๏ธ๐Ÿšฆ๐Ÿ›‘

One of the most fundamental tasks - we need to know where other cars and people are, what signs, traffic lights and road markings need to be considered. Objects are identified by 2D or 3D bounding boxes.

Relevant methods: R-CNN, Fast(er) R-CNN, YOLO
Distance Estimation ๐Ÿ“

After you know what objects are present and where they are in the image, you need to know where they are in the 3D world.

Since the camera is a 2D sensor you need to first estimate the distance to the objects.

Relevant methods: Kalman Filter, Deep SORT
Read 11 tweets
20 Apr
Interesting results from the small experiment... ๐Ÿ˜„

This was actually a study reported in a Nature paper. Most people offer additive solutions (adding bricks) instead of substractive solutions (removing the pillar).

More details ๐Ÿ‘‡

In this example, the most elegant solution is to remove the pillar completely and let the roof lie on the block. It will be simpler, more stable and won't cost anything.

Some people quickly dismiss this option assuming this is not allowed, but it actualy is ๐Ÿ˜ƒ
This isn't because people don't recognize the value, but because many don't consider the substractive solution at all. Me included ๐Ÿ™‹โ€โ™‚๏ธ

The paper shows that this happens a lot in real life, especially in regulation. People tend to add new rules, instead of removing old ones.
Read 5 tweets
20 Apr
Sensors for self-driving cars ๐ŸŽฅ ๐Ÿง  ๐Ÿš™

There are 3 main sensors types used in self-driving cars for environment perception:
โ–ช๏ธ Camera
โ–ช๏ธ Radar
โ–ช๏ธ Lidar

They all have different advantages and disadvantages. Read below to learn more about them.

Thread ๐Ÿ‘‡
1๏ธโƒฃ Camera

The camera is arguably the most important sensor - the camera images contain the most information compared to the other sensors.

Modern cars across all self-driving levels have many cameras for a 360ยฐ coverage:
โ–ช๏ธ BMW 8 series - 7
โ–ช๏ธ Tesla - 8
โ–ช๏ธ Waymo - 29
This is an example from Tesla of what a typical camera sees and detects in the scene. Videos from other companies look very similar.

Read 9 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!

:(