We train an MLP using contrastive learning to map fMRI signals to CLIP image embeddings.
The generated embeddings can be used for retrieval, & the exact original image can be retrieved among highly similar candidates, showing that the embeddings retain fine-grained information.
Scaling up the retrieval to a large database like LAION-5B allows MindEye to output realistic images from brain activity without using any generative model.
But we can do classic reconstruction too, with SOTA results!
For this purpose, we found it necessary to train a diffusion prior to further "align" the generated CLIP-fMRI embeddings with standard CLIP embeddings.
Once we obtain aligned CLIP image embeddings, we can pass it into any pretrained diffusion model that accepts CLIP image embeddings to perform reconstruction!
We find Versatile Diffusion gives best performance. Better image generation models in the future may give better recons!
Low-level features are also appropriately reconstructed by mapping the fMRI signals to Stable Diffusion VAE latents and using that as a starting point for img2img.
Using this dual pipeline approach, MindEye obtains SOTA results on both high-level and low-level metrics (table of results in preprint)!
Here is a comparison to previous methods in the literature:
I started this project about a year ago, and it originally started out in @laion_ai.
We were lucky that @humanscotti joined and took the lead on this project, he's done a great job moving this project forward!
@KGreshake showed how prompt injections can be incorporated in webpages or other content that may be retrieved by LLM systems to result in nefarious behavior.
Here, text is embedded in a webpage to direct BingChat to perform a scam.
Here is another example where an injection can be spread via email.
I think LLM applications are super exciting but we certainly should be cautious of any security concerns like this.
I also tried some medical images too! Here I started with some histopathology. I passed in an H&E image of prostate cancer and asked GPT-4 to describe it. It knew it was an H&E image of glandular tissue but was unable to identify it as low grade prostate cancer.
The goal is to build AI assistants that follow certain "constitutional principles" to make models less harmful (generating offensive outputs, reinforcing social biases,etc.)
We can use AI feedback & supervision to follow these principles & limit the human feedback needed. (2/13)
So, I've heard people say anyone could have built ChatGPT. I think this is disingenuous.
ChaGPT isn't just GPT-3 w/ a chat interface on top of it.
The closest base model on the OpenAI API is probably text-davinci-003, but it was only released a day before ChatGPT! (1/9)
Maybe someone could have created a model like text-davinci-003?
Well, ChatGPT/text-davinci-003 are trained with lots and lots of human feedback, which is why it does so well. That's not easy for anyone to obtain! (2/9)
OpenAI is clearly a leader in utilizing human feedback for improved models. They invented RLHF, one of the leading approaches, which powers ChatGPT.
On a related note, claiming OpenAI just scaled up existing work is ignoring OpenAI's expertise in utilizing human feedback. (3/9)