Tatsunori Hashimoto Profile picture
Assistant Prof at Stanford CS, member of @stanfordnlp and statsml groups; Formerly at Microsoft / postdoc at Stanford CS / Stats.
May 23, 2023 7 tweets 5 min read
We are releasing AlpacaFarm, a simulator enabling everyone to run and study the full RLHF pipeline at a fraction of the time (<24h) and cost (<$200) w/ LLM-simulated annotators. Starting w/ Alpaca, we show RLHF gives big 10+% winrate gains vs davinci003 (crfm.stanford.edu/2023/05/22/alp…) Image We find the RLHF simulator to be very accurate.
The simulated annotators are close to humans in agreement rate (65 vs 66%) at 1/45th the cost, and rankings of methods trained in simulation agree with rankings of methods trained on real human feedback. Image
Mar 13, 2023 5 tweets 5 min read
Instruction-following models are now ubiquitous, but API-only access limits research.
Today, we’re releasing info on Alpaca (solely for research use), a small but capable 7B model based on LLaMA that often behaves like OpenAI’s text-davinci-003.

Demo: crfm.stanford.edu/alpaca/ Alpaca is an instruction-tuned version of LLaMA 7B, where our 52k demonstrations are based on the self-instruct method of Wang et al w/ text-davinci-003.

Combining small tuning data and model allows us to train Alpaca quickly (3hrs on 8xA100).

Data: github.com/tatsu-lab/stan…
Nov 2, 2021 5 tweets 3 min read
Interested in differential privacy (DP) or private NLP?

Our preprint has something for both interests: arxiv.org/abs/2110.05679

We found that privacy-preserving NLP can be painless (code below), and DP-SGD works surprisingly well on extremely large models.

(contd) On the private NLP side, we show provable privacy is easy:

Fine-tuning language models with DP-SGD nearly match nonprivate performance for a wide range of tasks, spanning classification, table2text generation, and dialog generation.

Want to try it? (contd)