Andrew Gao Profile picture
Feb 4, 2023 6 tweets 5 min read Read on X
see what the future looks like with AI!

generate scifi style images with an auto-complete interface to help you write prompts quicker.

try now: portalpix.app

made using Leap from @leap_api @fdotinc

#aiart #ai #ml #generativeai #stablediffusion #buildspace #fdotinc
San Francisco
@leap_api @fdotinc just start typing and it'll suggest tags for you to add (just hit enter)
dubai 🏝️
japanese street market

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Andrew Gao

Andrew Gao Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @itsandrewgao

May 13
To everyone disappointed by @openai today... don't be. The livestream was for a general consumer audience.

The cool stuff is "hidden" on their site.

I am really excited. (Text to 3D??)
🧵🧵
@OpenAI 1/ Lightyears ahead of anyone at having text in AI generated images. Gorgeous
Image
Image
@OpenAI 2/ so confident in their text image abilities they can create fonts with #GPT4-o Image
Read 11 tweets
May 8
🔔the guy who invented the LSTM just dropped a new LLM architecture! (Sepp Hochreiter)

Major component is a new parallelizable LSTM.
⚠️one of the major weaknesses of prior LSTMs was the sequential nature (can't be done at once)

Everything we know about the XLSTM: 👇👇🧵 Image
1/
Three major weaknesses of LSTMs that make Transformers better:
"Inability to revise storage decisions"
"Limited storage capacities"
"Lack of parallelizability due to memory mixing".

SEE THE GIF, if you don't get it. LSTMs are sequential which basically means you have to go through the green boxes (simplified) one after the other. You need the results from the prior box before you can move on.

Transformers don't do this. They parallelize operations across tokens, which is a really really big deal.

So how did Sepp and team solve?
Keep reading: 👇👇
GIF credit: Michael Phi
towardsdatascience.com/illustrated-gu…
2/
Here is the overview from the paper:

Main contributions:
matrix memory for LSTM (NO memory mixing)
exponential gating

This might mean nothing to you so I'll break it down with the help of Claude! 🧵👇 arxiv.org/pdf/2405.04517
Image
Read 15 tweets
May 7
gpt2-chatbot RETURNS! it's now TWO similarly performing models.

i've been testing them.

everything i can tell you 👇🧵 #gpt2
Image
1/ first of all, @sama posted this cryptic tweet a few days ago.
that tweet contains the name of one of the two new GPT2 models.

can I confirm that it is from OpenAI? no. However, model creators need to work with @lmsysorg to add the model and it seems strange for LMSYS team to allow someone to pretend

how good are the mystery models? 👇👇👇🧵👀
2/ it seems like the models behave similarly to the OG #gpt2-chatbot. They seem to be fine-tuned for agentic reasoning and planning.

Take a look at the screenshots.

Now for coding abilities (what I really care about): 🧵👇 Image
Read 15 tweets
Apr 29
uh.... gpt2-chatbot just solved an International Math Olympiad (IMO) problem in one-shot

the IMO is insanely hard. only the FOUR best math students in the USA get to compete

prompt + its thoughts 🧵
Image
damn i accidentally closed the tab.
Read 8 tweets
Apr 29
🧵megathread of speculations on "gpt2-chatbot": tuned for agentic capabilities?

some of my thoughts, some from reddit, some from other tweeters

my early impression is 👇Image
Image
Image
1/

there's a limit of 8 messages per day so i didn't get to try it much but it feels around GPT-4 level, i don't know yet if I would say better... (could be placebo effect and i think it's too easy to delude yourself)

it sounds similar but different to gpt-4's voice

as for agentic abilities...
2/ look at the screenshots i attached but it seems to be better than GPT-4 at planning out what needs to be done.

for instance, it comes up with potential sites to look at, and potential search queries. GPT-4 gives a much more vague answer (go to top tweet)

but imo
👇 Image
Read 21 tweets
Mar 17
here's your DEEP DIVE into @grok's architecture!
I just went through the , for this 314B open source behemoth with *no strings attached*.

👇🧵 model.py
Image
@grok 1. Basics:
314 B, mixture of 8 experts (2 active)
86B active parameters

It's using Rotary Embeddings #rope instead of fixed positional embeddings

📜👇👇
@grok 2.
tokenizer vocab size: 131,072 (similar to GPT-4) 2^17 btw
embedding size: 6,144 (48*128)

64 transformer layers (sheesh)
Each layer has a decoder layer: Multihead attention block and denseblock
Key value size : 128

👇👇👇 not done yet Image
Read 10 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(