Rohan Paul Profile picture
Nov 24 2 tweets 1 min read Read on X
Some nice collection of resources for LLMs Image

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Rohan Paul

Rohan Paul Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @rohanpaul_ai

Nov 16
Consolidated insights on LLM fine-tuning - a long read across 114 pages.

"Ultimate Guide to Fine-Tuning LLMs"

Worth a read during the weekend.

Few ares it covers 👇

📊 Fine-tuning Pipeline

→ Outlines a seven-stage process for fine-tuning LLMs, from data preparation to deployment and maintenance.

🧠 Advanced Fine-tuning Methods

→ Covers techniques like Proximal Policy Optimization (PPO) and Direct Preference Optimization (DPO) for aligning LLMs with human preferences.

🛠️ Parameter-Efficient Fine-Tuning (PEFT) Techniques

→ Discusses methods like LoRA, QLoRA, and adapters that enable efficient fine-tuning by updating only a subset of model parameters.

🔬 Evaluation metrics and benchmarks for assessing fine-tuned LLMs

→ Includes perplexity, accuracy, and task-specific measures. Benchmarks like GLUE, SuperGLUE, TruthfulQA, and MMLU assess various aspects of LLM performance. Safety evaluations using frameworks like DecodingTrust are also crucial for ensuring responsible AI deployment.

💻 Explores various deployment approaches and optimization techniques to enhance LLM performance and efficiency in real-world applications.

🌐 Examines the extension of fine-tuning techniques to multimodal models and domain-specific applications in fields like medicine and finance.

Note, the content's value stands on its merit, even though possibly the authors leveraged AI assistance in some parts of the paper's creation.

🧵 1/nImage
🧵 2/n

A chronological timeline showcasing the evolution of LLMs from 1990 to 2023. Image
🧵 3/n

Mind map depicting various dimensions of Large Language Models (LLMs), covering aspects from pre-training and fine-tuning methodologies to efficiency, evaluation, inference, and application domains. Image
Read 14 tweets
Nov 8
MapReduce meets LLMs: Divide-and-conquer approach lets regular LLMs process 100x longer documents than their context limit

Using MapReduce principles, small-context LLMs now handle million-token documents efficiently.

Original Problem 🔍:

LLMs struggle to process extremely long texts exceeding their context window, limiting their application in tasks requiring comprehensive document understanding.

-----

Solution in this Paper 🛠️:

• LLM × MapReduce: A training-free framework for long-sequence processing
• Structured information protocol: Addresses inter-chunk dependency
• In-context confidence calibration: Resolves inter-chunk conflicts
• Three-stage process: Map, collapse, and reduce stages for efficient processing

-----

Key Insights from this Paper 💡:

• Divide-and-conquer approach enables short-context LLMs to handle long texts
• Structured information and confidence calibration improve cross-chunk processing
• Framework is compatible with different LLMs, demonstrating generalization capability
• Efficient design outperforms standard decoding in speed

-----

Results 📊:

• Outperforms closed-source and open-source LLMs on InfiniteBench
• Average score: 68.66 (vs. 57.34 for GPT-4)
• Enables Llama3-70B-Instruct (8K context) to process 1280K tokens
• Faster inference: 2 GPUs for 128K tokens (vs. 4 GPUs for standard decoding)Image
🧩 The key components of the LLM × MapReduce framework

The LLM × MapReduce framework consists of three main stages:

1. Map stage: The long input text is divided into chunks, and an LLM extracts necessary information from each chunk.

2. Collapse stage: If the mapped results still exceed the model's context window, they are compressed while maintaining the same structure as the mapped results.

3. Reduce stage: The final response is generated based on the collapsed results.Image
🔑 The paper introduces two key innovations to address the challenges of inter-chunk dependency and inter-chunk conflict:

1. Structured information protocol: This protocol defines the information passed from the map stage to the reduce stage, ensuring the model has critical inputs needed to infer the correct answer when aggregating different chunks.

2. In-context confidence calibration mechanism: This allows the model to assign reliable confidence scores to the output of each chunk, aiding in effectively resolving inter-chunk conflicts.
Read 4 tweets
Nov 6
"Understanding LLMs from Scratch Using Middle School Math"

Neural networks learn to predict text by converting words to numbers and finding patterns through attention mechanisms.

So the network turns words into numbers, then use attention to decide what's important for predicting next words

Nice long blog (40 minuted reading time), check the link in comment.Image
Content Image
here’s a neural network that does the classification Image
Read 6 tweets
Nov 4
For learning Machine Learning with actual projects, checkout this Repo.

920 open-source projects with a total of 4.7M stars grouped into 34 categories. Image
Image
Read 11 tweets
Nov 3
A comprehensive educational repository from @AnthropicAI containing 5 structured courses: API fundamentals, prompt engineering, real-world applications, evaluations, and tool integration with Claude APIs. Image
Anthropic API fundamentals Image
Prompt Engineering Interactive Tutorial Image
Read 7 tweets
Nov 1
Not all brain cells are equal - same goes for LLM attention heads! 💡

Why store everything when you can just remember the important stuff?

Smart KV cache compression that knows which attention heads matter most.

Hence, HeadKV intelligently compresses LLM memory by identifying and prioritizing crucial attention heads

🎯 Original Problem:

KV caching in LLMs faces significant memory overhead with increasing input length. Current compression methods operate at layer-level, missing the opportunity to optimize at individual attention head level.

-----

🔧 Solution in this Paper:

• HeadKV: Compresses KV cache at individual head level instead of layer level
• Allocates cache budgets based on head importance using Needle-in-a-Haystack tests
• HeadKV-R2: Enhanced version that evaluates both retrieval and reasoning abilities
• Uses dynamic budget allocation across heads based on importance scores
• Retains most relevant KV cache entries within each head using attention-based selection

-----

💡 Key Insights:

• Not all attention heads are equally important for text generation
• Head-level compression outperforms layer-level approaches
• Combining retrieval and reasoning abilities for importance scoring is crucial
• Dynamic budget allocation across heads is more effective than fixed allocation
• Just 1.5% of KV cache can retain 97% of full performance

-----

📊 Results:

• Achieves 97% of full KV cache performance while retaining only 1.5% of cache
• Outperforms baselines on LongBench and LooGLE benchmarks
• Superior performance in low-resource settings (KV size = 64 & 128)
• Maintains computational efficiency comparable to existing approaches
• Effective preservation of both retrieval and reasoning capabilitiesImage
🔍 The method operates in two key steps: First, it estimates head importance scores using Needle-in-a-Haystack tests that evaluate both retrieval and reasoning abilities.

Second, it allocates KV cache budgets to individual heads based on their importance scores, with more important heads receiving larger cache allocations.Image
💡 The main innovations are:

(1) Operating at individual head level rather than layer level for KV cache compression,

(2) Using a novel importance score estimation that considers both retrieval and reasoning abilities, and

(3) Implementing dynamic budget allocation across heads based on their importance distributions.Image
Read 5 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(