Tweet

How to get URL link on Twitter App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Philipp Schmid

@_philschmid

Jul 24 • 7 tweets • 2 min read Twitter logo

Read on Twitter

Is Llama 2 special or just a better iteration of Llama 1? 🤔 Over the weekend, I had time to read the paper in which Meta released. 📖

Below are some of my findings, which you might have missed📝

🧵 1/6

🧠 A 34B version may come later after more testing
⚖️ The 7B model used a 285x token to parameter ratio, with loss still decreasing.
💰 Training the 7B would cost ~$1M in AWS compute (5$ per A100 on AWS on-demand)
🛫 Llama Chat was started before Llama 2 finished training

🧵2/6

◼️ User prompts were masked/zeroed in SFT & RLHF training
👑 Reward Model (RM) accuracy is one of the most important proxies for Chat model
🚀 Collecting data in batches helped improve the overall model, since RM and LLM where iteratively re-trained.

🧵3/6

🔢 Used Rejection Sampling (RS) to distill knowledge from 70B for a better SFT dataset
🤔 Only used RS for the first 3 versions, then extended to RS + PPO
🆕 Proposed GAtt, inspired by Context Distillation, to augment fine-tuning data for better multi-turn conversations

🧵4/6

💡 RS + RM can boost performance by 10% compared to SFT
🛠 Chat model learned to use tools.

Check out the full paper here:

🧵5/6arxiv.org/abs/2307.03172

Meta says, “…reinforcement learning proved highly effective, particularly given its cost and time effectiveness. Our findings underscore that the crucial determinant of RLHF’s success lies in the synergy it fosters between humans and LLMs throughout the annotation process.”

wrong paper 🙃 arxiv.org/abs/2307.09288

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @_philschmid

Philipp Schmid

@_philschmid

Jun 19

OpenLLaMA 13B was released and competitive with its original counterpart from MetaAI. 🚀🎉 Two months ago, the OpenLM research initiative started to create a permissively licensed open-source reproduction of Meta AI’s LLaMA! 🛫

👉 huggingface.co/openlm-researc…
🧵 1/4

@Meta

Last week the team released the 13B weights under Apache 2.0 with evaluations on the lm-evaluation-harness by EleutherAI🔓
OpenLLaMA matches @Meta LLaMA with an avg score of 0.57, making it a perfect replacement for all your commercial use cases🥊

huggingface.co/openlm-researc…
🧵 2/4

@younggeng

OpenLLaMA is developed by @younggeng and @haoliuhl from Berkeley AI Research.
Thank you for this massive contribution to the open-source and science community!👏🏻🤗

🧵3/4

Read 4 tweets

Philipp Schmid

@_philschmid

Jun 9

Finally had the time to read the "The False Promise of Imitating Proprietary LLMs.” paper in detail. 📚✨ Below are some of my key takeaways: 📝

🔍 Objective:
- The paper aimed to evaluate the effectiveness of models trained on GPT outputs.

🧵 1/4

💻Implementation
- collected datasets imitating ChatGPT for specific tasks or broadly imitating its behavior (0.3M–150M tokens).
- Fine-tuned LLMs (GPT-2 and LLaMA)
- Evaluated with Humans and GPT-4 (blind pairwise comparisons with ChatGPT) and on canonical NLP benchmarks
🧵 2/4

💡 Learnings:
- Imitation models learn style, not knowledge
- Improving base LLMs has the highest impact
- imitating is feasible for distilling a specific behavior for a certain task or use case as opposed to broadly matching ChatGPT capabilities
🧵 3/4

Read 4 tweets

Philipp Schmid

@_philschmid

Jun 8

Introducing StarChat Beta β 🤖 Your new coding buddy 🙌Attention all coders and developers 💻

You can write in plain English, and it will understand your queries, offer explanations, and provide step-by-step guidance to solve coding problems 🤯

👉 huggingface.co/spaces/Hugging…
🧵1/4

StarChat can help you:
🙋🏻‍♂️ Answer coding questions in over 80 languages, including Python, Java, C++ and more!
🧠 Explain concepts and help debug your code
📊 Generate sample code for data visualizations and plots in Python
💬 Iterate together to solve your coding errors

🧵2/4

We fine-tuned StarChat Beta on the new StarCoderPlus (15B) ⭐️, which is a further trained version of StartCoder on 600B tokens from the English web dataset RedefinedWeb (Faclon dataset 🦅) 🔥

StarChat and StarCoder are open and can be used for commercial use cases 🤑

🧵3/4

Read 4 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter Twitter Thread URL to Unroll

Philipp Schmid

Try unrolling a thread yourself!

More from @_philschmid

Philipp Schmid

Philipp Schmid

Philipp Schmid

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!