Tanishq Mathew Abraham, Ph.D. Profile picture
PhD at 19 | Founder and CEO at @MedARC_AI | Research Director at @StabilityAI | @kaggle Notebooks GM | Biomed. engineer @ 14 | TEDx talk➡https://t.co/xPxwKTpz0D
7 subscribers
May 13 11 tweets 3 min read
The livestream demo is not the only cool part about GPT-4o

Remember, GPT-4o is an end-to-end trained multimodal model!

No one is reading the GPT-4o blog post which highlights so many other cool features

SEE MORE FEATURES GPT-4o HAS ↓ First of all, GPT-4o is a much better language model. It's SOTA on a variety of LLM benchmarks:
May 8 12 tweets 4 min read
AlphaFold3 is out!

This a diffusion model pipeline that goes beyond what AlphaFold2 did: predicting the structures of protein-molecule complexes containing DNA, RNA, ions, etc.

Blog post:
Paper:

A quick thread about the method↓blog.google/technology/ai/…
nature.com/articles/s4158… AlphaFold2 was impactful but had one major limitation: it could only predict structures of proteins by itself.

In reality, proteins have various modifications, bind to other molecules, form complexes w/ DNA, RNA, etc.

Structure of these complexes can't be predicted by AF2
Apr 30 15 tweets 6 min read
Google announces Med-Gemini, a family of Gemini models fine-tuned for medical tasks! 🔬

Achieves SOTA on 10 of the 14 benchmarks, spanning text, multimodal & long-context applications.

Surpasses GPT-4 on all benchmarks!

This paper is super exciting, let's dive in ↓Image The team developed a variety of model variants. First let's talk about the models they developed for language tasks.

The finetuning dataset is quite similar to Med-PaLM2, except with one major difference:

self-training with search

(2/14)Image
Jan 23 13 tweets 4 min read
Happy to share a new paper I worked on!:

"Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers"

abs:
website:

A quick thread about the paper! ↓ (1/11) arxiv.org/abs/2401.11605
crowsonkb.github.io/hourglass-diff…
Image Before I continue, I want to mention this work was led by @RiversHaveWings, @StefanABaumann, @Birchlabs. @DanielZKaplan, @EnricoShippole were also valuable contributors. (2/11)
Dec 26, 2023 7 tweets 3 min read
Are you wondering how the new Mamba language model works?

Mamba is based on state-space models (SSMs), a new competitor to the Transformer architecture.

Here are 5 resources to help you learn about SSMs & Mamba! ↓↓↓ 1. Mamba - a replacement for Transformers? by @SamuelAlbanie
Link →

Provides a short and quick overview of Mamba and the literature leading up to it.
Image
Oct 30, 2023 8 tweets 2 min read
The Biden-Harris administration has issued an Executive Order on AI safety. This is a big one!

Based on the Fact Sheet, here are some of the interesting parts of the EO ↓ There is significant focus on evaluation and standards for AI systems, including @NIST developing red-teaming standards. Image
May 30, 2023 10 tweets 6 min read
I'm really excited to share @MedARC_AI's first paper since our public launch 🥳

🧠👁️ MindEye!

Our state-of-the-art fMRI-to-image approach that retrieves and reconstructs images from brain activity!

Project page: medarc-ai.github.io/mindeye/
arXiv: arxiv.org/abs/2305.18274 Image We train an MLP using contrastive learning to map fMRI signals to CLIP image embeddings.

The generated embeddings can be used for retrieval, & the exact original image can be retrieved among highly similar candidates, showing that the embeddings retain fine-grained information. Image
Mar 28, 2023 5 tweets 3 min read
App-integrated LLMs can be jailbreaked:

@KGreshake showed how prompt injections can be incorporated in webpages or other content that may be retrieved by LLM systems to result in nefarious behavior.

Here, text is embedded in a webpage to direct BingChat to perform a scam. Here is another example where an injection can be spread via email.
Mar 24, 2023 16 tweets 6 min read
How does GPT-4 do in the medical domain?

I got to play around with its multimodal capabilities on some medical images!

Plus a recent Microsoft paper examined its text understanding and got SOTA results on USMLE medical exams!

A quick thread ↓ As I showed earlier, I had the chance last week to play around with GPT-4's multimodal capabilities:
Mar 16, 2023 8 tweets 3 min read
I got to try GPT-4's multimodal capabilities and it's quite impressive! A quick thread of examples...

Let's start out with solving a CAPTCHA, no big deal It can explain memes quite well! Here it is explaining an AI-generated meme I shared recently.

(The AIs will create their own memes and explain it to us humans 😂)
Mar 14, 2023 4 tweets 1 min read
GPT-4 release
Med-PaLM2 announcement
PaLM API release
Claude API release Image Oh I forgot ChatGLM! 😅
Feb 28, 2023 17 tweets 6 min read
Claude, @AnthropicAI's powerful ChatGPT alternative, was trained with "Constitutional AI".

Constitutional AI is particularly interesting since it uses less human feedback than other methods, making it more scalable.

Let's dive into how Constitutional AI works in 13 tweets! Constitutional AI (CAI) is based on:
1. Supervised Fine-Tuning (SFT)
2. Reinforcement Learning from Human Feedback (RLHF).

If you don't know how SFT & RLHF work, you should first check out my thread on the topic 😉 (1/13)
Feb 21, 2023 10 tweets 4 min read
So, I've heard people say anyone could have built ChatGPT. I think this is disingenuous.

ChaGPT isn't just GPT-3 w/ a chat interface on top of it.

The closest base model on the OpenAI API is probably text-davinci-003, but it was only released a day before ChatGPT! (1/9) Image Maybe someone could have created a model like text-davinci-003?

Well, ChatGPT/text-davinci-003 are trained with lots and lots of human feedback, which is why it does so well. That's not easy for anyone to obtain! (2/9)
Dec 28, 2022 12 tweets 5 min read
Are you wondering how large language models like ChatGPT and InstructGPT actually work?

One of the secret ingredients is RLHF - Reinforcement Learning from Human Feedback.

Let's dive into how RLHF works in 8 tweets! Large language models (LLMs) are trained w/ self-supervised learning using next token prediction which actually makes it bad for instruction following. This example from OpenAI's blog exemplifies how GPT-3 succeeds at next-token prediction but fails at instruction-following. 1/8
Nov 16, 2022 20 tweets 7 min read
I will attempt to explain the basic idea of how diffusion models work!

... in only 15 tweets! 😲

Let's get started ↓ Diffusion models are *generative* models, which simply means given some example datapoints (your training dataset), generate more like it.

For example, given cute dog images, generate more cute dog images! (1/15)
Sep 29, 2022 7 tweets 2 min read
ICLR 2023 (a top ML/AI conference) submissions have been released, and do you know what that means?

Time for mind-blowing papers! 🤯↓ 1. DreamFusion by @GoogleAI

Text-to-3D generation starting from a pretrained text-to-image diffusion model and not needing any 3D training data:
Sep 21, 2022 10 tweets 5 min read
Today, @OpenAI announced Whisper, an automatic speech recognition model. Plus, they released it open-source!

Blog post → openai.com/blog/whisper/
Research paper → cdn.openai.com/papers/whisper…
Open-source code and models → github.com/openai/whisper

Quick thread about it (1/10) ↓ Continuing their trend of scaling to web-scale dataset, the group collected a dataset of 680k hours of audio+text transcriptions. It's a very diverse dataset, including multiple languages, speakers, recording setups, environments, etc. (2/10)
Aug 31, 2022 11 tweets 4 min read
So @StableDiffusion has various options and controls and one of the main ones is the sampler used for generation. Let's talk a little bit about these samplers since this has some interesting and unexpected effects on generated image quality (below image from subreddit)🧵 First, a brief summary about how Stable Diffusion works. Stable Diffusion is a diffusion model, which is a neural network trained to iteratively denoise an image from pure noise. (2/11)
Aug 26, 2022 8 tweets 4 min read
Research in AI is surprisingly more accessible to people with different backgrounds compared to other fields.

Anyone (w/ relevant experience) can contribute to impactful research.

Here are 5 research orgs you can join to contribute to real, open research in deep learning ↓ 1. #EleutherAI

EleutherAI may be the most famous AI open research collective. Lots of great work has been released by EleutherAI, such as the Pile dataset, GPT-J, GPT-NeoX-20B, and VQGAN-CLIP.

Link → discord.gg/zBGx3azzUn
Jun 29, 2022 9 tweets 6 min read
Applying deep learning to pathology is quite challenging due to the sheer size of the slide images (gigapixels!).

A common approach is to divide images into smaller patches, for which deep learning features can be extracted & aggregated to provide a slide-level diagnosis (1/9) Unfortunately, dividing into small patches limits the context to cellular features, missing out on the various levels of relevant features, like larger-scale tissue organization. (2/9)
Jun 13, 2022 16 tweets 15 min read
You may have seen surreal and absurd AI-generated images like these ones...

These are all generated with an AI tool known as DALL·E mini

Let's talk about the history of #dallemini, and also *how* it works! ↓↓↓🧵 Image First, let's clarify the different AI tools which many get confused about:

- DALL·E was an @OpenAI-developed AI project from Jan 2021

- DALL·E mini is a community-created project inspired by DALL·E
- DALL·E 2 is another @OpenAI-developed tool released in April (2/16)