Shayne Longpre Profile picture
PhD @MIT @medialab. @Google Brain. Ex: @apple ML, @stanfordnlp CS. 🇨🇦 Interests: AI/ML/NLP, Social Science, Online Governance.
Jerome Ku Profile picture Mother Data Profile picture 2 added to My Authors
Feb 1 11 tweets 7 min read
✨New Paper✨What’s the best completely public competitor to #ChatGPT?

Flan-T5 beats all public models we tested:
Flan-T5 3B ▶️ T0++ 3B ▶️ OPT-IML 175B ▶️ GLM-130B ▶️ Flan 2021 3B ▶️ NIv2 3B

We release the @GoogleAI 🌟Flan Collection🌟data + methods for Instruction Tuning!

1/ The 🌟Flan Collection🌟 (1st used in Flan-PaLM

➕ Merges Flan 2021, P3, NIv2, CoT instruction-datasets into 1800+ dataset collection
➕ Data augmentations and mixing strategies
➕ 100s new templates

Oct 6, 2022 13 tweets 7 min read
📢 A 🧵 on the Trends in NLP Datasets.

What’s changed since SQuAD was all the rage in 2016? A: A LOT. 🔭

1. Generic ➡️ Niche Tasks
2. Task-specific Training+Eval ➡️ Eval Only
3. Dataset ➡️ Benchmark ➡️ Massive Collections
4. Datasets ➡️ Diagnostics

1/ What started as a trickle became an explosion of NLP datasets over the last few years.

@sebastian ruder used to track all NLP sets on his website: It’s no longer possible to keep up-to-date.

Jun 14, 2022 16 tweets 9 min read
📢 A 🧵on the future of NLP model inputs.

What are the options and where are we going? 🔭

1. Task-specific finetuning (FT)
2. Zero-shot prompting
3. Few-shot prompting
4. Chain of thought (CoT)
5. Parameter-efficient finetuning (PEFT)
6. Dialog

[1/] ImageImage 🌟Task-specific finetuning 🌟

The traditional way to prepare NLP models for deployment, it usually obtains the best performance for a specific task, but:

(a) it requires many training examples
(b) it (often) specializes a model for ONE task and ONE data input format ONLY

May 28, 2022 6 tweets 6 min read
Sharing my *rough* slides from a @CCCatMIT February reading group.

Covers "NLP Training Trends for Large Language Models" (LLM) and a survey of 4 new interesting papers: FLAN, T0, ExT5, MetaICL!

📚: [1/6] 1st paper we discuss multi-task fine-tuning in FLAN by @_jasonwei, @MaartenBosma, et al.

TLDR: Multi-task instruction tuning a 137B model on dozens of tasks vastly improves zero/few-shot learning

📜: [2/6]