Getting medical data is hard, because of privacy concerns, and at the beginning of the pandemic, there was just not much data in general.
Many papers were using very small datasets often collected from a single hospital - not enough for real evaluation.
π
Biased datasets π§π§βπ¦²
Some papers used a dataset that contained non-COVID images from children and COVID images from adults. These methods probably learned to distinguish children from adults... π€·ββοΈ
π
Training and testing on the same data β
OK, you just never do that! Never!
π
Unbalanced datasets βοΈ
There are much more non-COVID scans than real COVID cases, but not all papers managed to adequately balance their dataset to account for that.
Check out this thread for more details on how to deal with imbalanced data:
Many papers failed to disclose the amount of data they were tested or important aspects of how their models work leading to poor reproducibility and biased results.
π
The problem is in the data π½
The big problem for most methods was the availability of high-quality data and a deep understanding of the problem - many papers didn't even consult with radiologists.
A high-quality and diverse dataset is more important than your fancy model!
Let's talk about a common problem in ML - imbalanced data βοΈ
Imagine we want to detect all pixels belonging to a traffic light from a self-driving car's camera. We train a model with 99.88% performance. Pretty cool, right?
Actually, this model is useless β
Let me explain π
The problem is the data is severely imbalanced - the ratio between traffic light pixels and background pixels is 800:1.
If we don't take any measures, our model will learn to classify each pixel as background giving us 99.88% accuracy. But it's useless!
What can we do? π
Let me tell you about 3 ways of dealing with imbalanced data:
βͺοΈ Choose the right evaluation metric
βͺοΈ Undersampling your dataset
βͺοΈ Oversampling your dataset
βͺοΈ Adapting the loss
The creator and lead dev of the popular NFT exchange Hic Et Nunc on the Tezos blockchain decided to shut down the project. He pulled the plug on the whole website and the official Twitter account.
Yet, the damage is not fatal π
How come?
β NFTs are fine - they are stored on the blockchain
β NFT metadata is fine - stored on IPFS
β Exchange backend code is fine - it is in an immutable smart contract
β The website is back online - it is open-source, so a clone was deployed by the community fast
π
Of course, this is a dramatic event and the quick recovery was only possible because of the immense effort of the community. But it is possible and it took basically 1 day.
Imagine the damage that the creator and lead dev could do if they want to destroy a Web 2.0 company!
How I made $3000 in 3 weeks selling AI-generated art? π°
Last week I showed you how you can use VQGAN+CLIP to generate interesting images based on text prompts.
Now, I'll tell you how I sold some of these as NFTs for more than $3000 in less than 3 weeks.
Let's go π
Background
I've been interested in NFTs for 2 months now and one collection I find interesting is @cryptoadzNFT. What's special about it is that the creator @supergremplin published all of the art in the public domain. This spurred the creation of many derivative projects.
π
The Idea π‘
My idea was to use VQGAN+CLIP to create interesting versions of the CrypToadz. So, I started experimenting with my own toad #6741.
I took the original NFT image as a start and experimented a lot with different text prompts. The results were very promising!
In their latest paper, they introduce the so-called verifiers. The generative model generates 100 solutions, but the verifiers select the one that has the highest chance of being factually correct.