Worrying if my job was jeopardized by AI this week or if we’re still good, I read a new paper evaluating #GPT4V - a #GPT4 version handling image and text inputs. It produces *impressive* radiology reports. But let’s delve deeper into some of the results... #radiology #AI
Here, GPT4V correctly identified a fracture of the 5th metatarsal bone. However, this is not a Jones fracture (which is in the proximal part of the bone and sometimes doesn’t heal well, requiring more aggressive management). Almost correct ≠ Correct, esp. in medicine.
Here, the model correctly identified a suspicious pulmonary nodule but incorrectly described its location and explicitly hallucinated its size. Additionally, it inferred a lack of pathologically enlarged lymph nodes, which is impossible to determine from just one slice.
This is a sagittal plane slice from a knee MRI exam. The model correctly picks up the joint effusion and a likely meniscal tear, but also states that the cruciate ligaments are intact, which not possible to infer from this slice alone.
Finally, the model correctly identifies signs of small bowel obstruction on this abdominal x-ray. To fulfill some clichés #GPT4V casually threw in some clinical correlation advice. Debatable! (paging @becker_rad)
Radiology workflows are inherently multi-modal; large multi-modal models (#LMMs) are an exciting development. It looks like it may become even harder to spot hallucinations, and that domain expertise is currently more valuable than ever.
🎉Introducing RoentGen, a generative vision-language foundation model based on #StableDiffusion, fine-tuned on a large chest x-ray and radiology report dataset, and controllable through text prompts!
#RoentGen is able to generate a wide variety of radiological chest x-ray (CXR) findings with fidelity and high level of detail. Of note, this is without being explicitly trained on class labels.
Built on previous work, #RoentGen is a fine-tuned latent diffusion model based on #StableDiffusion. Free-form medical text prompts are used to condition a denoising process, resulting in high-fidelity yet diverse CXR, improving on a typical limitation of GAN-based methods.
#stablediffusion is a #LatentDiffusionModel and performs its generative tasks efficiently on low-dimensional representations of high-dimensional training inputs. SD's VAE latent space preserves relevant information contained in CXR; they can be reconstructed with high fidelity.
#StableDiffusion’s output can be controlled at inference time by using text prompts, but it is unclear how much medical imaging concepts SD incorporates. Simple text prompts show how hard it can be to get realistic-looking medical images out-of-the-box without specific training.