Rohan Paul Profile picture
May 28, 2022 6 tweets 16 min read Read on X
Kullback-Leibler (KL) Divergence - A Thread

It is a measure of how one probability distribution diverges from another expected probability distribution.

#DataScience #Statistics #DeepLearning #ComputerVision #100DaysOfMLCode #Python #programming #ArtificialIntelligence #Data
KL Divergence has its origins in information theory. The primary goal of information theory is to quantify how much information is in data. The most important metric in information theory is called Entropy

#DataScience #Statistics #DeepLearning #ComputerVision #100DaysOfMLCode

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Rohan Paul

Rohan Paul Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @rohanpaul_ai

Dec 12
Synthetic data and iterative self-improvement is all you need.

No humans needed in the evaluation loop.

This paper introduces a self-improving evaluator that learns to assess LLM outputs without human feedback, using synthetic data and iterative self-training to match top human-supervised models.

-----

Original Problem 🤔:

Building strong LLM evaluators typically requires extensive human preference data, which is costly and becomes outdated as models improve. Current approaches rely heavily on human annotations, limiting scalability and adaptability.

-----

Solution in this Paper 🔧:

→ The method starts with unlabeled instructions and uses a seed LLM to generate contrasting response pairs, where one is intentionally inferior.

→ It then uses an LLM-as-Judge approach to generate reasoning traces and final judgments for these synthetic pairs.

→ The system filters correct judgments and uses them to train an improved evaluator model.

→ This process repeats iteratively, with each iteration using the improved model to generate better synthetic training data.

-----

Key Insights from this Paper 💡:

→ Human preference data isn't necessary for training strong LLM evaluators

→ Synthetic data generation with iterative self-improvement can match human-supervised approaches

→ Different data sources (safety, math, coding) improve performance in their respective domains

-----

Results 📊:

→ Improved RewardBench accuracy from 75.4 to 88.3 (88.7 with majority voting)

→ Outperformed GPT-4 (84.3) and matched top reward models trained with human data

→ Achieved 79.5% agreement with human judgments on MT-Bench using majority votingImage
The diagram shows how an AI system learns to evaluate responses without human help, using an iterative training process:

1. Input Stage 🎯
- It starts with a prompt (x)
- Creates a similar but slightly different version of that prompt (x')

2. Response Generation 🔄
- The system uses an LLM to create two responses:
- A "good" response to the original prompt
- A "bad" response by answering the modified prompt

3. Judgment Phase 📊
- An AI judge (Mi) evaluates these responses
- It samples multiple judgments about which response is better
- The system selects only the correct verdicts

4. Training Loop ⚙️
- These judgments are collected as training data
- The system uses this data to train an improved version of itself (Mi+1)
- This new, better model becomes the judge for the next round

Think of it like a student who:
1. Creates their own practice problems
2. Solves them in both good and not-so-good ways
3. Learns to tell the difference between good and bad solutions
4. Uses this knowledge to get even better at judging solutions

The key innovation is that this entire process runs automatically, without needing humans to say which answers are good or bad. The system teaches itself to become a better evaluator through practice and iteration.Image
Paper Title: "Self-Taught Evaluators"

Generated below podcast on this paper with Google's Illuminate.
Read 5 tweets
Dec 10
Beautiful Opensource Tool, ScrapeGraphAI with 16.2K Github stars 🌟

Turns natural language commands into production-ready web scrapers using LLM-powered graph pipelines.

This library stands out by integrating Large Language Models (LLMs) and modular graph-based pipelines to automate the scraping of data from various sources (e.g., websites, local files etc.

Why ScrapegraphAI ❓

Traditional web scraping tools often rely on fixed patterns or manual configuration to extract data from web pages. ScrapegraphAI, leveraging the power of LLMs, adapts to changes in website structures, reducing the need for constant developer intervention.

→ ScrapeGraphAI builds web scraping pipelines using LLMs and directed graph logic.

→ It extracts information from websites and local documents (XML, HTML, JSON, Markdown) through simple natural language prompts.

→ Offers multiple specialized pipelines: single-page scraping, multi-page extraction, script generation, and audio output generation.

→ Supports OpenAI, Groq, Azure, Gemini APIs and local Ollama models. Features parallel LLM calls, multi-language support, and integrates with browsers through Playwright. Built for production use with comprehensive testing and CI/CD.Image
💻 Usage

There are multiple standard scraping pipelines that can be used to extract information from a website (or local file).

The most common one is the `SmartScraperGraph`, which extracts information from a single page given a user prompt and a source URL. Image
Try it directly on the web using Google Colab:

colab.research.google.com/drive/1sEZBonB…Image
Read 4 tweets
Dec 3
HunyuanVideo: Open-source alternative to Runway Gen-3, Luma 1.6, few others top performing Chinese video generative models just arrived. 🤯

🎯 A 13B-parameter open-source video generator model from by Tencent that matches commercial quality 👏

→ HunyuanVideo represents a major advancement in open-source video generation, released by Tencent in December 2024 with public code and model weights

→ The model matches or exceeds closed-source solutions while being fully accessible to researchers and developers

→ Running on H800/H20 GPUs, it requires 45-60GB memory depending on resolution settings

🔬 Architecture

→ The foundation is a Causal 3D VAE that intelligently compresses videos with specific ratios - 4x for time dimension, 8x for spatial dimensions, and 16x for channels

→ Unlike traditional approaches using CLIP/T5, HunyuanVideo employs a decoder-only Multimodal LLM as its text encoder, enabling better image-text alignment and complex reasoning

→ The architecture follows a novel dual-stream to single-stream progression - first processing video and text independently, then merging them for enhanced multimodal fusion

→ A sophisticated prompt rewriting system offers two modes: Normal for better understanding user intent, and Master for enhancing visual quality aspects

🛠️ Implementation Details

→ Supports various aspect ratios including 9:16, 16:9, 4:3, 3:4, and 1:1 with resolutions up to 720p

→ Uses flow matching for training with a configurable shift factor of 9.0 and embedded classifier-free guidance

→ Provides CPU offloading capabilities to manage memory efficiently during high-resolution generation

📊 Performance Metrics

→ Professional evaluation across 1,533 prompts shows superior results: 68.5% text alignment, 64.5% motion quality, 96.4% visual quality



Read 4 tweets
Dec 1
Emotional RAG: AI now recall memories based on emotions, just like humans do.

Original Problem 🤔:

Role-playing agents powered by LLMs struggle to maintain consistent personality traits and generate human-like responses due to limited emotional context in memory retrieval.

-----

Solution in this Paper 💡:

• Introduces Emotional RAG framework for role-playing agents

• Encodes both semantic and emotional vectors for queries and memory

• Implements two retrieval strategies:
- Combination: Fuses semantic and emotional similarity scores
- Sequential: Retrieves based on one factor, then reranks using the other

• Designs emotion-aware prompt templates for LLMs

-----

Key Insights from this Paper:

→ Incorporating emotional states in memory retrieval enhances personality consistency
→ Mood-Dependent Memory theory from psychology applies to AI agents
→ Different retrieval strategies work best for different personality evaluation metrics
→ Emotional congruence improves the human-likeness of generated responses

-----

Results 📊:

• Outperforms traditional RAG methods across multiple datasets

• Significant improvements in full personality evaluations (MBTI, BFI)

• Better performance on open-source models (ChatGLM-6B, Qwen-72B) compared to GPT-3.5

• Achieves higher accuracy in overall personality trait predictionsImage
🔍Emotional RAG framework consists of four main components:

→ Query encoding: Encodes both semantic and emotional aspects of user queries

→ Memory encoding: Stores and encodes conversation history with semantic and emotional vectors

→ Emotional retrieval: Retrieves relevant memory based on both semantic and emotional similarity

→ Response generation: Uses retrieved memory along with character profile to generate responsesImage
Read 4 tweets
Nov 25
Type a sentence, get any sound - from talking cats to singing saxophones. Brilliant release by NVIDIA

✨ NVIDIA just unveiled Fugatto, a groundbreaking 2.5B parameter audio AI model that can generate and transform any combination of music, voices, and sounds using text prompts and audio inputs

Fugatto could ultimately allow developers and creators to bring sounds to life simply by inputting text prompts,

→ The model demonstrates unique capabilities like creating hybrid sounds (trumpet barking), changing accents/emotions in voices, and allowing fine-grained control over sound transitions - trained on millions of audio samples using 32 NVIDIA H100 GPUs

👨‍🔧 Architecture

Built as a foundational generative transformer model leveraging NVIDIA's previous work in speech modeling and audio understanding. The training process involved creating a specialized blended dataset containing millions of audio samples

→ ComposableART's Innovation in Audio Control

Introduces a novel technique allowing combination of instructions that were only seen separately during training. Users can blend different audio attributes and control their intensity

→ Temporal Interpolation Capabilities

Enables generation of evolving soundscapes with precise control over transitions. Can create dynamic audio sequences like rainstorms fading into birdsong at dawn

→ Processes both text and audio inputs flexibly, enabling tasks like removing instruments from songs or modifying specific audio characteristics while preserving others

→ Shows capabilities beyond its training data, creating entirely new sound combinations through interaction between different trained abilities

🔍 Real-world Applications

→ Allows rapid prototyping of musical ideas, style experimentation, and real-time sound creation during studio sessions

→ Enables dynamic audio asset generation matching gameplay situations, reducing pre-recorded audio requirements

→ Can modify voice characteristics for language learning applications, allowing content delivery in familiar voices

@NVIDIAAIDev
→ Creates a massive dataset (20M+ rows, ~330 years of audio) by combining multiple open source datasets and using LLMs to generate rich descriptions and instructions Image
→ Optimal Transport Conditional Flow Matching

Trains using OT-CFM objective with a T5-based transformer architecture and adaptive layer normalization Image
Read 6 tweets
Nov 16
Consolidated insights on LLM fine-tuning - a long read across 114 pages.

"Ultimate Guide to Fine-Tuning LLMs"

Worth a read during the weekend.

Few ares it covers 👇

📊 Fine-tuning Pipeline

→ Outlines a seven-stage process for fine-tuning LLMs, from data preparation to deployment and maintenance.

🧠 Advanced Fine-tuning Methods

→ Covers techniques like Proximal Policy Optimization (PPO) and Direct Preference Optimization (DPO) for aligning LLMs with human preferences.

🛠️ Parameter-Efficient Fine-Tuning (PEFT) Techniques

→ Discusses methods like LoRA, QLoRA, and adapters that enable efficient fine-tuning by updating only a subset of model parameters.

🔬 Evaluation metrics and benchmarks for assessing fine-tuned LLMs

→ Includes perplexity, accuracy, and task-specific measures. Benchmarks like GLUE, SuperGLUE, TruthfulQA, and MMLU assess various aspects of LLM performance. Safety evaluations using frameworks like DecodingTrust are also crucial for ensuring responsible AI deployment.

💻 Explores various deployment approaches and optimization techniques to enhance LLM performance and efficiency in real-world applications.

🌐 Examines the extension of fine-tuning techniques to multimodal models and domain-specific applications in fields like medicine and finance.

Note, the content's value stands on its merit, even though possibly the authors leveraged AI assistance in some parts of the paper's creation.

🧵 1/nImage
🧵 2/n

A chronological timeline showcasing the evolution of LLMs from 1990 to 2023. Image
🧵 3/n

Mind map depicting various dimensions of Large Language Models (LLMs), covering aspects from pre-training and fine-tuning methodologies to efficiency, evaluation, inference, and application domains. Image
Read 14 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(