Tanishq Mathew Abraham, Ph.D.'s Threads

Mar 27 • 5 tweets • 1 min read

"A Manga Guide to DeepSeek-V3 Technical Report"

from now on this is how I will post all papers 🤣

Feb 26 • 5 tweets • 2 min read

Diffusion language models are SO FAST!!

A new startup, Inception Labs, has released Mercury Coder, "the first commercial-scale diffusion large language model"

It's 5-10x faster than current gen LLMs, providing high-quality responses at low costs.

And you can try it now!

The performance is similar to small frontier models while achieving a throughput of ~1000 tokens/sec... on H100s! Reaching this level of throughput for autoregressive LLMs typically requires specialized chips.

Feb 17 • 5 tweets • 2 min read

Have you heard of Cleo?

Cleo was an account on Math Stack Exchange that was infamous for dropping the answer to the most difficult integrals with no explanation...

often mere minutes after the question was asked!!

For years, no one knew who Cleo was, UNTIL NOW!

People noticed that the same few people were interacting with Cleo (asking the questions Cleo answered, commenting, etc.), a couple of them only active at the same time as Cleo as well.

People were wondering maybe someone is controlling all these accounts as alts

May 13, 2024 • 11 tweets • 3 min read

The livestream demo is not the only cool part about GPT-4o

Remember, GPT-4o is an end-to-end trained multimodal model!

No one is reading the GPT-4o blog post which highlights so many other cool features

SEE MORE FEATURES GPT-4o HAS ↓ First of all, GPT-4o is a much better language model. It's SOTA on a variety of LLM benchmarks:

https://twitter.com/iScienceLuvr/status/1790082827016364071

May 8, 2024 • 12 tweets • 4 min read

AlphaFold3 is out!

This a diffusion model pipeline that goes beyond what AlphaFold2 did: predicting the structures of protein-molecule complexes containing DNA, RNA, ions, etc.

Blog post:
Paper:

A quick thread about the method↓blog.google/technology/ai/…
nature.com/articles/s4158…

AlphaFold2 was impactful but had one major limitation: it could only predict structures of proteins by itself.

In reality, proteins have various modifications, bind to other molecules, form complexes w/ DNA, RNA, etc.

Structure of these complexes can't be predicted by AF2

Apr 30, 2024 • 15 tweets • 6 min read

Google announces Med-Gemini, a family of Gemini models fine-tuned for medical tasks! 🔬

Achieves SOTA on 10 of the 14 benchmarks, spanning text, multimodal & long-context applications.

Surpasses GPT-4 on all benchmarks!

This paper is super exciting, let's dive in ↓

The team developed a variety of model variants. First let's talk about the models they developed for language tasks.

The finetuning dataset is quite similar to Med-PaLM2, except with one major difference:

self-training with search

(2/14)

Jan 23, 2024 • 13 tweets • 4 min read

Happy to share a new paper I worked on!:

"Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers"

abs:
website:

A quick thread about the paper! ↓ (1/11) arxiv.org/abs/2401.11605
crowsonkb.github.io/hourglass-diff…

Before I continue, I want to mention this work was led by @RiversHaveWings, @StefanABaumann, @Birchlabs. @DanielZKaplan, @EnricoShippole were also valuable contributors. (2/11)

Dec 26, 2023 • 7 tweets • 3 min read

Are you wondering how the new Mamba language model works?

Mamba is based on state-space models (SSMs), a new competitor to the Transformer architecture.

Here are 5 resources to help you learn about SSMs & Mamba! ↓↓↓ 1. Mamba - a replacement for Transformers? by @SamuelAlbanie
Link →

Provides a short and quick overview of Mamba and the literature leading up to it.

Oct 30, 2023 • 8 tweets • 2 min read

The Biden-Harris administration has issued an Executive Order on AI safety. This is a big one!

Based on the Fact Sheet, here are some of the interesting parts of the EO ↓ There is significant focus on evaluation and standards for AI systems, including @NIST developing red-teaming standards.

May 30, 2023 • 10 tweets • 6 min read

I'm really excited to share @MedARC_AI's first paper since our public launch 🥳

🧠👁️ MindEye!

Our state-of-the-art fMRI-to-image approach that retrieves and reconstructs images from brain activity!

Project page: medarc-ai.github.io/mindeye/
arXiv: arxiv.org/abs/2305.18274

We train an MLP using contrastive learning to map fMRI signals to CLIP image embeddings.

The generated embeddings can be used for retrieval, & the exact original image can be retrieved among highly similar candidates, showing that the embeddings retain fine-grained information.

Mar 28, 2023 • 5 tweets • 3 min read

App-integrated LLMs can be jailbreaked:

@KGreshake showed how prompt injections can be incorporated in webpages or other content that may be retrieved by LLM systems to result in nefarious behavior.

Here, text is embedded in a webpage to direct BingChat to perform a scam.

Here is another example where an injection can be spread via email.

Mar 24, 2023 • 16 tweets • 6 min read

How does GPT-4 do in the medical domain?

I got to play around with its multimodal capabilities on some medical images!

Plus a recent Microsoft paper examined its text understanding and got SOTA results on USMLE medical exams!

A quick thread ↓ As I showed earlier, I had the chance last week to play around with GPT-4's multimodal capabilities:

https://twitter.com/iScienceLuvr/status/1636479850214232064

Mar 16, 2023 • 8 tweets • 3 min read

I got to try GPT-4's multimodal capabilities and it's quite impressive! A quick thread of examples...

Let's start out with solving a CAPTCHA, no big deal

It can explain memes quite well! Here it is explaining an AI-generated meme I shared recently.

(The AIs will create their own memes and explain it to us humans 😂)

Mar 14, 2023 • 4 tweets • 1 min read

GPT-4 release
Med-PaLM2 announcement
PaLM API release
Claude API release

Oh I forgot ChatGLM! 😅

Feb 28, 2023 • 17 tweets • 6 min read

Claude, @AnthropicAI's powerful ChatGPT alternative, was trained with "Constitutional AI".

Constitutional AI is particularly interesting since it uses less human feedback than other methods, making it more scalable.

Let's dive into how Constitutional AI works in 13 tweets! Constitutional AI (CAI) is based on:
1. Supervised Fine-Tuning (SFT)
2. Reinforcement Learning from Human Feedback (RLHF).

If you don't know how SFT & RLHF work, you should first check out my thread on the topic 😉 (1/13)

https://twitter.com/iScienceLuvr/status/1608070009921900546

Feb 21, 2023 • 10 tweets • 4 min read

So, I've heard people say anyone could have built ChatGPT. I think this is disingenuous.

ChaGPT isn't just GPT-3 w/ a chat interface on top of it.

The closest base model on the OpenAI API is probably text-davinci-003, but it was only released a day before ChatGPT! (1/9)

Maybe someone could have created a model like text-davinci-003?

Well, ChatGPT/text-davinci-003 are trained with lots and lots of human feedback, which is why it does so well. That's not easy for anyone to obtain! (2/9)

Dec 28, 2022 • 12 tweets • 5 min read

Are you wondering how large language models like ChatGPT and InstructGPT actually work?

One of the secret ingredients is RLHF - Reinforcement Learning from Human Feedback.

Let's dive into how RLHF works in 8 tweets! Large language models (LLMs) are trained w/ self-supervised learning using next token prediction which actually makes it bad for instruction following. This example from OpenAI's blog exemplifies how GPT-3 succeeds at next-token prediction but fails at instruction-following. 1/8

Nov 16, 2022 • 20 tweets • 7 min read

I will attempt to explain the basic idea of how diffusion models work!

... in only 15 tweets! 😲

Let's get started ↓ Diffusion models are *generative* models, which simply means given some example datapoints (your training dataset), generate more like it.

For example, given cute dog images, generate more cute dog images! (1/15)

Sep 29, 2022 • 7 tweets • 2 min read

ICLR 2023 (a top ML/AI conference) submissions have been released, and do you know what that means?

Time for mind-blowing papers! 🤯↓ 1. DreamFusion by @GoogleAI

Text-to-3D generation starting from a pretrained text-to-image diffusion model and not needing any 3D training data:

https://twitter.com/poolio/status/1575576632068214785

Sep 21, 2022 • 10 tweets • 5 min read

Today, @OpenAI announced Whisper, an automatic speech recognition model. Plus, they released it open-source!

Blog post → openai.com/blog/whisper/
Research paper → cdn.openai.com/papers/whisper…
Open-source code and models → github.com/openai/whisper

Quick thread about it (1/10) ↓

Continuing their trend of scaling to web-scale dataset, the group collected a dataset of 680k hours of audio+text transcriptions. It's a very diverse dataset, including multiple languages, speakers, recording setups, environments, etc. (2/10)

Aug 31, 2022 • 11 tweets • 4 min read

So @StableDiffusion has various options and controls and one of the main ones is the sampler used for generation. Let's talk a little bit about these samplers since this has some interesting and unexpected effects on generated image quality (below image from subreddit)🧵

First, a brief summary about how Stable Diffusion works. Stable Diffusion is a diffusion model, which is a neural network trained to iteratively denoise an image from pure noise. (2/11)

Share this page!

Enter URL or ID to Unroll