Avi Chawla Profile picture
Jan 19 9 tweets 3 min read Read on X
Let's build a multi-agent internet research assistant with OpenAI Swarm & Llama 3.2 (100% local):
Before we begin, here's what we're building!

The app takes a user query, searches the web for it, and turns it into a well-crafted article.

Tool stack:
- @ollama for running LLMs locally.
- @OpenAI Swarm for multi-agent orchestration.
- @Streamlit for the UI.
The architecture diagram below illustrates the key components (agents/tools) & how they interact with each other!

Let's implement it now!
Agent 1: Web search and tool use

The web-search agent takes a user query and then uses the DuckDuckGo search tool to fetch results from the internet. Image
Agent 2: Research Analyst

The role of this agent is to analyze and curate the raw search results and make them ready to use for the content writer agent. Image
Agent 3: Technical Writer

The role of a technical writer is to use the curated results and turn them into a polished, publication-ready article. Image
Create a workflow

Now that we have all our agents and tools ready, it's time to put them together and create a workflow.

Here's how we do it: Image
The Chat interface

Finally we create a Streamlit UI to provide a chat interface for our application.

Done! Image
That's a wrap!

If you enjoyed this tutorial:

Find me → @_avichawla

Every day, I share tutorials and insights on DS, ML, LLMs, and RAGs.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Avi Chawla

Avi Chawla Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @_avichawla

Dec 10
You're in an AI Engineer interview at OpenAI.

The interviewer asks:

"Our GPT model generates 100 tokens in 42 seconds.

How do you make it 5x faster?"

You: "I'll allocate more GPUs for faster generation."

Interview over.

Here's what you missed:
The real bottleneck isn't compute, it's redundant computation.

Without KV caching, your model recalculates keys and values for each token, repeating work.

- with KV caching → 9 seconds
- without KV caching → 42 seconds (~5x slower)

Let's dive in to understand how it works!
To understand KV caching, we must know how LLMs output tokens.

- Transformer produces hidden states for all tokens.
- Hidden states are projected to the vocab space.
- Logits of the last token are used to generate the next token.
- Repeat for subsequent tokens.

Check this👇
Read 10 tweets
Dec 7
You're in a Research Scientist interview at OpenAI.

The interviewer asks:

"How would you expand the context length of an LLM from 2K to 128K tokens?"

You: "I will fine-tune the model on longer docs with 128K context."

Interview over.

Here's what you missed:
Extending the context window isn't just about larger matrices.

In a traditional transformer, expanding tokens by 8x increases memory needs by 64x due to the quadratic complexity of attention. Refer to the image below!

So, how do we manage it?

continue...👇 Image
1) Sparse Attention

It limits the attention computation to a subset of tokens by:

- Using local attention (tokens attend only to their neighbors).
- Letting the model learn which tokens to focus on.

But this has a trade-off between computational complexity and performance. Image
Read 12 tweets
Nov 25
Context engineering, clearly explained (with visuals):

(an illustrated guide below) Image
So, what is context engineering?

It’s the art and science of delivering the right information, in the right format, at the right time, to your LLM.

Here's a quote by Andrej Karpathy on context engineering...👇 Image
To understand context engineering, it's essential to first understand the meaning of context.

Agents today have evolved into much more than just chatbots.

The graphic below summarizes the 6 types of contexts an agent needs to function properly.

Check this out 👇 Image
Read 10 tweets
Oct 24
Let's build a reasoning LLM using GRPO, from scratch (100% local):
Today, we're going to learn how to turn any model into a reasoning powerhouse.

We'll do so without any labeled data or human intervention, using Reinforcement Finetuning (GRPO)!

Tech stack:

- @UnslothAI for efficient fine-tuning
- @HuggingFace TRL to apply GRPO

Let's go! 🚀
What is GRPO?

Group Relative Policy Optimization is a reinforcement learning method that fine-tunes LLMs for math and reasoning tasks using deterministic reward functions, eliminating the need for labeled data.

Here's a brief overview of GRPO before we jump into code: Image
Read 12 tweets
Oct 5
JSON prompting for LLMs, clearly explained:
Today, let's understand what exactly JSON prompting is and how it can drastically improve your AI outputs!

The visual below gives a head-to-head comparison with traditional prompting.

Let's dive in!
What is Json anyway?

JSON stands for JavaScript Object Notation.

Don’t let the name scare you; it’s just a way to organize info with clear labels.

You can think of it like a pizza order ticket with clear labels so the kitchen gets it right: Image
Read 11 tweets
Sep 19
I've been coding in Python for 9 years now.

If I were to start over today, here's a complete roadmap:
1️⃣ Python bootcamp by @freeCodeCamp

4 hours Python bootcamp with over 46M views!! It covers:

- Installing Python
- Setting up an IDE
- Basic Syntax
- Variables & Datatypes
- Looping in Python
- Exception handling
- Modules & pip
- Mini hands-on projects

Check this out👇 Image
2️⃣ AI Python by @DeepLearningAI

- Learn Python basics and AI for data tasks
- Solve real problems in business, marketing, and journalism
- Use AI tools for debugging and simulation

Check this out👇 Image
Read 8 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(