Thread by @cxbln on Thread Reader App

Worrying if my job was jeopardized by AI this week or if we’re still good, I read a new paper evaluating #GPT4V - a #GPT4 version handling image and text inputs. It produces *impressive* radiology reports. But let’s delve deeper into some of the results... #radiology #AI

Here, GPT4V correctly identified a fracture of the 5th metatarsal bone. However, this is not a Jones fracture (which is in the proximal part of the bone and sometimes doesn’t heal well, requiring more aggressive management). Almost correct ≠ Correct, esp. in medicine.

Here, the model correctly identified a suspicious pulmonary nodule but incorrectly described its location and explicitly hallucinated its size. Additionally, it inferred a lack of pathologically enlarged lymph nodes, which is impossible to determine from just one slice.

This is a sagittal plane slice from a knee MRI exam. The model correctly picks up the joint effusion and a likely meniscal tear, but also states that the cruciate ligaments are intact, which not possible to infer from this slice alone.

Finally, the model correctly identifies signs of small bowel obstruction on this abdominal x-ray. To fulfill some clichés #GPT4V casually threw in some clinical correlation advice. Debatable! (paging @becker_rad)

Radiology workflows are inherently multi-modal; large multi-modal models (#LMMs) are an exciting development. It looks like it may become even harder to spot hallucinations, and that domain expertise is currently more valuable than ever.

📝arxiv.org/abs/2309.17421

Share this Scrolly Tale with your friends.

A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.

Share this page!

Enter URL or ID to Unroll