Tweet

Tanishq Mathew Abraham

Mar 24 • 16 tweets • 6 min read

How does GPT-4 do in the medical domain?

I got to play around with its multimodal capabilities on some medical images!

Plus a recent Microsoft paper examined its text understanding and got SOTA results on USMLE medical exams!

A quick thread ↓

https://twitter.com/iScienceLuvr/status/1636479850214232064

As I showed earlier, I had the chance last week to play around with GPT-4's multimodal capabilities:

https://twitter.com/iScienceLuvr/status/1636479850214232064

I also tried some medical images too! Here I started with some histopathology. I passed in an H&E image of prostate cancer and asked GPT-4 to describe it. It knew it was an H&E image of glandular tissue but was unable to identify it as low grade prostate cancer.

Here I passed in an image of invasive lobular carcinoma with characteristic single file lines of tumor nuclei. It fails to notice this unfortunately not matter how hard I try.

Here is an example of a glioblastoma (severe brain tumor). It has a characteristic feature again that suggests the glioblastoma diagnosis (pseudopalisading necrosis) but it fails to notice that. It does realize the presence of what looks like tumor nuclei.

This image shows H&E of basal cell carcinoma (skin cancer). GPT-4 notices that it is of skin but cannot identify the pathology.

Overall though, GPT-4 mostly refuses to provide anything similar to a diagnosis. Here is one such example with and X-ray image.

My conclusion on the multimodal side is that GPT-4 is a impressive first step towards multimodal medical understanding, but its understanding right now is fairly rudimentary, and there is a lot of room to improve here.

On the text side of things, however, the situation is different. In a recent paper from Microsoft Research, "Capabilities of GPT-4 on Medical Challenge Problems", GPT-4 obtains SOTA on USMLEs (medical student exams), significantly outperforming GPT 3.5.

Other benchmark datasets were tested as well, with GPT-4 again reaching SOTA for most of them.

This was all done without any sophisticated prompting techniques, as shown here

One may worry the high performance is due to data contamination. Interestingly this paper performed a memorization analysis, and they didn't find any of the tested USMLE questions with their memorization detection (though it doesn't 100% confirm no memorization).

Plus the USMLE material is behind paywall and probably unlikely to be in the GPT4 training set anyway.

Overall, seems the medical understanding of text-only GPT-4 is significantly improved & multimodal GPT-4 has rudimentary understanding.

Many more experiments should be done to study GPT-4's medical knowledge/reasoning. Some previous studies using GPT-3 concluded domain/task-specific fine-tuned model are better, and I wonder if the conclusion changes now with GPT-4.

#MedTwitter #PathTwitter

@iScienceLuvr

If you like this thread, please share!

Consider following me for AI-related content! → @iScienceLuvr

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @iScienceLuvr

Tanishq Mathew Abraham

@iScienceLuvr

Mar 16

I got to try GPT-4's multimodal capabilities and it's quite impressive! A quick thread of examples...

Let's start out with solving a CAPTCHA, no big deal

It can explain memes quite well! Here it is explaining an AI-generated meme I shared recently.

(The AIs will create their own memes and explain it to us humans 😂)

Here is another awesome example

Read 8 tweets

Tanishq Mathew Abraham

@iScienceLuvr

Mar 14

GPT-4 release
Med-PaLM2 announcement
PaLM API release
Claude API release

Oh I forgot ChatGLM! 😅

https://twitter.com/iScienceLuvr/status/1633749279222403074

This meme also relevant today 🤣

https://twitter.com/iScienceLuvr/status/1633749279222403074

Read 4 tweets

Tanishq Mathew Abraham

@iScienceLuvr

Feb 28

@AnthropicAI

Claude, @AnthropicAI's powerful ChatGPT alternative, was trained with "Constitutional AI".

Constitutional AI is particularly interesting since it uses less human feedback than other methods, making it more scalable.

Let's dive into how Constitutional AI works in 13 tweets!

https://twitter.com/iScienceLuvr/status/1608070009921900546

Constitutional AI (CAI) is based on:
1. Supervised Fine-Tuning (SFT)
2. Reinforcement Learning from Human Feedback (RLHF).

If you don't know how SFT & RLHF work, you should first check out my thread on the topic 😉 (1/13)

https://twitter.com/iScienceLuvr/status/1608070009921900546

The goal is to build AI assistants that follow certain "constitutional principles" to make models less harmful (generating offensive outputs, reinforcing social biases,etc.)

We can use AI feedback & supervision to follow these principles & limit the human feedback needed. (2/13)

Read 17 tweets

Tanishq Mathew Abraham

@iScienceLuvr

Feb 21

So, I've heard people say anyone could have built ChatGPT. I think this is disingenuous.

ChaGPT isn't just GPT-3 w/ a chat interface on top of it.

The closest base model on the OpenAI API is probably text-davinci-003, but it was only released a day before ChatGPT! (1/9)

Maybe someone could have created a model like text-davinci-003?

Well, ChatGPT/text-davinci-003 are trained with lots and lots of human feedback, which is why it does so well. That's not easy for anyone to obtain! (2/9)

OpenAI is clearly a leader in utilizing human feedback for improved models. They invented RLHF, one of the leading approaches, which powers ChatGPT.

On a related note, claiming OpenAI just scaled up existing work is ignoring OpenAI's expertise in utilizing human feedback. (3/9)

Read 10 tweets

Tanishq Mathew Abraham

@iScienceLuvr

Dec 28, 2022

Are you wondering how large language models like ChatGPT and InstructGPT actually work?

One of the secret ingredients is RLHF - Reinforcement Learning from Human Feedback.

Let's dive into how RLHF works in 8 tweets!

Large language models (LLMs) are trained w/ self-supervised learning using next token prediction which actually makes it bad for instruction following. This example from OpenAI's blog exemplifies how GPT-3 succeeds at next-token prediction but fails at instruction-following. 1/8

What if we could use human-annotated data to improve our language model? One approach, known as supervised fine-tuning (SFT), is to take your pretrained LLM and fine-tune it with (prompt, human-written response) pairs. 2/8

Read 12 tweets

Tanishq Mathew Abraham

@iScienceLuvr

Nov 16, 2022

I will attempt to explain the basic idea of how diffusion models work!

... in only 15 tweets! 😲

Let's get started ↓

Diffusion models are *generative* models, which simply means given some example datapoints (your training dataset), generate more like it.

For example, given cute dog images, generate more cute dog images! (1/15)

There are many generative models. GANs (like the ones powering thispersondoesnotexist.com) are image generative models, GPT-3 is also a generative model, but for text. So just keep in mind that while we'll talk about images, the general principles can apply to other domains. (2/15)

Read 20 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Tanishq Mathew Abraham

People who liked this thread also liked...

Try unrolling a thread yourself!

More from @iScienceLuvr

Tanishq Mathew Abraham

Tanishq Mathew Abraham

Tanishq Mathew Abraham

Tanishq Mathew Abraham

Tanishq Mathew Abraham

Tanishq Mathew Abraham

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!