Philipp Schmid Profile picture
Jul 24 7 tweets 2 min read Twitter logo Read on Twitter
Is Llama 2 special or just a better iteration of Llama 1? 🤔 Over the weekend, I had time to read the paper in which Meta released. 📖

Below are some of my findings, which you might have missed📝

🧵 1/6 Image
🧠 A 34B version may come later after more testing
⚖️ The 7B model used a 285x token to parameter ratio, with loss still decreasing.
💰 Training the 7B would cost ~$1M in AWS compute (5$ per A100 on AWS on-demand)
🛫 Llama Chat was started before Llama 2 finished training

🧵2/6
◼️ User prompts were masked/zeroed in SFT & RLHF training
👑 Reward Model (RM) accuracy is one of the most important proxies for Chat model
🚀 Collecting data in batches helped improve the overall model, since RM and LLM where iteratively re-trained.

🧵3/6
🔢 Used Rejection Sampling (RS) to distill knowledge from 70B for a better SFT dataset
🤔 Only used RS for the first 3 versions, then extended to RS + PPO
🆕 Proposed GAtt, inspired by Context Distillation, to augment fine-tuning data for better multi-turn conversations

🧵4/6
💡 RS + RM can boost performance by 10% compared to SFT
🛠 Chat model learned to use tools.

Check out the full paper here:

🧵5/6arxiv.org/abs/2307.03172
Meta says, “…reinforcement learning proved highly effective, particularly given its cost and time effectiveness. Our findings underscore that the crucial determinant of RLHF’s success lies in the synergy it fosters between humans and LLMs throughout the annotation process.”
wrong paper 🙃 arxiv.org/abs/2307.09288

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Philipp Schmid

Philipp Schmid Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @_philschmid

Jun 19
OpenLLaMA 13B was released and competitive with its original counterpart from MetaAI. 🚀🎉 Two months ago, the OpenLM research initiative started to create a permissively licensed open-source reproduction of Meta AI’s LLaMA! 🛫

👉 huggingface.co/openlm-researc…
🧵 1/4 Image
Last week the team released the 13B weights under Apache 2.0 with evaluations on the lm-evaluation-harness by EleutherAI🔓
OpenLLaMA matches @Meta LLaMA with an avg score of 0.57, making it a perfect replacement for all your commercial use cases🥊

huggingface.co/openlm-researc…
🧵 2/4
OpenLLaMA is developed by @younggeng and @haoliuhl from Berkeley AI Research.
Thank you for this massive contribution to the open-source and science community!👏🏻🤗

🧵3/4
Read 4 tweets
Jun 9
Finally had the time to read the "The False Promise of Imitating Proprietary LLMs.” paper in detail. 📚✨  Below are some of my key takeaways: 📝

🔍 Objective:
- The paper aimed to evaluate the effectiveness of models trained on GPT outputs.

🧵 1/4 Image
💻Implementation
- collected datasets imitating ChatGPT for specific tasks or broadly imitating its behavior (0.3M–150M tokens).
- Fine-tuned LLMs (GPT-2 and LLaMA)
- Evaluated with Humans and GPT-4 (blind pairwise comparisons with ChatGPT) and on canonical NLP benchmarks
🧵 2/4
💡 Learnings:
- Imitation models learn style, not knowledge
- Improving base LLMs has the highest impact
- imitating is feasible for distilling a specific behavior for a certain task or use case as opposed to broadly matching ChatGPT capabilities
🧵 3/4
Read 4 tweets
Jun 8
Introducing StarChat Beta β 🤖 Your new coding buddy 🙌Attention all coders and developers 💻 

You can write in plain English, and it will understand your queries, offer explanations, and provide step-by-step guidance to solve coding problems 🤯

👉 huggingface.co/spaces/Hugging…
🧵1/4 Image
StarChat can help you:
🙋🏻‍♂️ Answer coding questions in over 80 languages, including Python, Java, C++ and more!
🧠 Explain concepts and help debug your code
📊 Generate sample code for data visualizations and plots in Python
💬 Iterate together to solve your coding errors

🧵2/4
We fine-tuned StarChat Beta on the new StarCoderPlus (15B) ⭐️, which is a further trained version of StartCoder on 600B tokens from the English web dataset RedefinedWeb (Faclon dataset 🦅) 🔥

StarChat and StarCoder are open and can be used for commercial use cases 🤑

🧵3/4
Read 4 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(