Shayne Longpre Profile picture
May 24 10 tweets 10 min read Twitter logo Read on Twitter
This semester my @CCCatMIT co-instructors and I taught #MIT's first post-#ChatGPT Generative AI course, covering:

➡️Uses and new abilities
➡️LM Evaluation
➡️AI-mediated communication
➡️Societal challenges

📜 Syllabus + reading list 📚: ai4comm.media.mit.edu

1/ Image
It was a 🎢wild journey to teach in the midst of GPT-4 + Bard launches, moratorium letters, and raging online controversies every d*mn day.

We're excited to release our (and our students') learnings, slides, and the talks from our guest speakers.

Stay tuned!

2/
Over the next few days we'll post talks/talk summaries from:

➡️ @RishiBommasani guest lecture on Holistic Evaluation of Language Models

📜: crfm.stanford.edu/helm/latest/

3/ ImageImage
➡️ @_jasonwei on LLM Emergent Abilities as well as a general intro to LLMs

📜: ai.googleblog.com/2022/11/charac…

4/ ImageImage
➡️ @bakkermichiel on "Fine-tuning language models to find agreement among humans with diverse preferences"

📜: arxiv.org/pdf/2211.15006…

5/ ImageImage
➡️ @MinaLee__ on "Designing and Evaluating Language Models for Human Interaction"

📜: arxiv.org/abs/2212.09746 and arxiv.org/abs/2201.06796

6/ ImageImage
➡️ @informor on "My AI must have been broken": Understanding our Future of AI-Mediated Communication

📜: arxiv.org/abs/2206.07271 and dl.acm.org/doi/10.1145/32…

7/ ImageImage
➡️ @johnjhorton on "Large Language Models as Simulated Economic Agents: What Can We Learn from Homo Silicus?"

📜: arxiv.org/abs/2301.07543

8/ ImageImage
As well as a panel on a variety of topics (organized by @Schropes) with several speakers: @_ziv_e @mattgroh @bcsaldias @trudypainter and our own instructor @hjian42 !

9/ Image
This course was designed and taught with my awesome fellow student co-instructors @Schropes @jad_kabbara @hjian42 @suyashfulay @dougb

🧵/

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Shayne Longpre

Shayne Longpre Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @ShayneRedford

May 22
#NewPaperAlert When and where does pretraining (PT) data matter?

We conduct the largest published PT data study, varying:
1⃣ Corpus age
2⃣ Quality/toxicity filters
3⃣ Domain composition

We have several recs for model creators…
📜: bit.ly/3WxsxyY

1/ 🧵 Image
First, PT data selection is mired in mysticism.

1⃣ Documentation Debt: #PALM2 & #GPT4 don't document their data
2⃣ PT is expensive ➡️ experiments are sparse
3⃣ So public data choices are largely guided by ⚡️intuition, rumors, and partial info⚡️

2/ Image
PT is the foundation of data-centric and modern LMs. This research was expensive but important to shed light on open questions in training data design.

Here are our main findings:

3/
Read 17 tweets
Mar 28
What dates📅 can @OpenAI, @AnthropicAI, @CohereAI models reliably answer questions for?🔭

I binary-search through "future" Wiki events to find out. Results ❌🟰❌documentation:

#GPT4 ➡️~Dec 19 ('21)
#ChatGPT ➡️~Oct 24
Claude v1.2➡️~Oct 10
Cohere XL Nightly➡️~Apr 24 ('22)

1/🧵
GPT4 says it is trained up to Sept 2021.

I found it correctly answers unknowable events in Oct, Nov, and even Dec 11th & 19th.

In late Dec it begins to abstain.

2/
Interestingly, GPT 3.5 "Default" answers correctly only until ~Oct 24, 2021, but GPT 3.5 "Legacy" answers correctly until ~Oct 31, 2021 then begins hallucinating false answers or abstaining in Nov.

Perhaps this is due to finetuning rather than pretraining data?

3/
Read 7 tweets
Feb 27
🔭 A 🧵 on @OpenAI LLM "Alignment" (e.g. #ChatGPT)

Q: How does this differ from publicly available "Instruction Tuning" (IT)?

A: Proprietary Alignment is actually 3 separate components:

1⃣ Instruction tuning
2⃣ ➕ Open-ended generation/creative prompts
3⃣ ➕ Human feedback

1/
Component 1⃣:

Instruction Tuning, in its simplest form, teaches the model to follow/answer instructions, instead of generating plausible continuations.

E.g. see @GoogleAI's Flan Collection: arxiv.org/abs/2301.13688

2/
Instruction Tuning public collections are made of 95%+:
➡️ academic,
➡️ short-answer,
➡️ traditional,
NLP tasks. This is a limitation.

3/
Read 17 tweets
Feb 1
✨New Paper✨What’s the best completely public competitor to #ChatGPT?

Flan-T5 beats all public models we tested:
Flan-T5 3B ▶️ T0++ 3B ▶️ OPT-IML 175B ▶️ GLM-130B ▶️ Flan 2021 3B ▶️ NIv2 3B

We release the @GoogleAI 🌟Flan Collection🌟data + methods for Instruction Tuning!

1/
The 🌟Flan Collection🌟 (1st used in Flan-PaLM bit.ly/3Zu7bU2):

➕ Merges Flan 2021, P3, NIv2, CoT instruction-datasets into 1800+ dataset collection
➕ Data augmentations and mixing strategies
➕ 100s new templates

2/
This yields the best performing instruction tuning collection that has been compiled and released into one repo.

See our survey Figure of the prior works we built on to produce this compilation.

3/
Read 11 tweets
Oct 6, 2022
📢 A 🧵 on the Trends in NLP Datasets.

What’s changed since SQuAD was all the rage in 2016? A: A LOT. 🔭

1. Generic ➡️ Niche Tasks
2. Task-specific Training+Eval ➡️ Eval Only
3. Dataset ➡️ Benchmark ➡️ Massive Collections
4. Datasets ➡️ Diagnostics

1/
What started as a trickle became an explosion of NLP datasets over the last few years.

@sebastian ruder used to track all NLP sets on his website: nlpprogress.com. It’s no longer possible to keep up-to-date.

2/
🌟 Trend 1 🌟 Generic dataset are replaced with more niche datasets.

⏳ Before: datasets released for general tasks.

⌛️ Now: We see tasks targeting hyper-specific abilities.

Exs:

3/
Read 13 tweets
Jun 14, 2022
📢 A 🧵on the future of NLP model inputs.

What are the options and where are we going? 🔭

1. Task-specific finetuning (FT)
2. Zero-shot prompting
3. Few-shot prompting
4. Chain of thought (CoT)
5. Parameter-efficient finetuning (PEFT)
6. Dialog

[1/] ImageImage
🌟Task-specific finetuning 🌟

The traditional way to prepare NLP models for deployment, it usually obtains the best performance for a specific task, but:

(a) it requires many training examples
(b) it (often) specializes a model for ONE task and ONE data input format ONLY

[2/]
Because large language models (LLMs) can be:

(a) v expensive to train, and
(b) have emergent capabilities to interpret a NEW task from only an instruction

researchers are experimenting with new strategies to get model predictions…

[3/]
Read 16 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(