Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Avi Chawla

@_avichawla

Jan 19 • 9 tweets • 3 min read • Read on X

Scrolly

Let's build a multi-agent internet research assistant with OpenAI Swarm & Llama 3.2 (100% local):

Before we begin, here's what we're building!

The app takes a user query, searches the web for it, and turns it into a well-crafted article.

Tool stack:
- @ollama for running LLMs locally.
- @OpenAI Swarm for multi-agent orchestration.
- @Streamlit for the UI.

The architecture diagram below illustrates the key components (agents/tools) & how they interact with each other!

Let's implement it now!

Agent 1: Web search and tool use

The web-search agent takes a user query and then uses the DuckDuckGo search tool to fetch results from the internet.

Agent 2: Research Analyst

The role of this agent is to analyze and curate the raw search results and make them ready to use for the content writer agent.

Agent 3: Technical Writer

The role of a technical writer is to use the curated results and turn them into a polished, publication-ready article.

Create a workflow

Now that we have all our agents and tools ready, it's time to put them together and create a workflow.

Here's how we do it:

The Chat interface

Finally we create a Streamlit UI to provide a chat interface for our application.

Done!

That's a wrap!

If you enjoyed this tutorial:

Find me → @_avichawla

Every day, I share tutorials and insights on DS, ML, LLMs, and RAGs.

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @_avichawla

Avi Chawla

@_avichawla

Aug 4

A simple technique makes RAG ~32x memory efficient!

- Perplexity uses it in its search index
- Azure uses it in its search pipeline
- HubSpot uses it in its AI assistant

Let's understand how to use it in RAG systems (with code):

Today, let's build a RAG system that queries 36M+ vectors in <30ms using Binary Quantization.

Tech stack:
- @llama_index for orchestration
- @milvusio as the vector DB
- @beam_cloud for serverless deployment
- @Kimi_Moonshot Kimi-K2 as the LLM hosted on Groq

Let's build it!

Here's the workflow:

- Ingest documents and generate binary embeddings.
- Create a binary vector index and store embeddings in the vector DB.
- Retrieve top-k similar documents to the user's query.
- LLM generates a response based on additional context.

Let's implement this!

Read 14 tweets

Avi Chawla

@_avichawla

Aug 3

- Google Maps uses graph ML to predict ETA
- Netflix uses graph ML (GNN) in recommendation
- Spotify uses graph ML (HGNNs) in recommendation
- Pinterest uses graph ML (PingSage) in recommendation

Here are 6 must-know ways for graph feature engineering (with code):

Like images, text, and tabular datasets have features, so do graph datasets.

This means when building models on graph datasets, we can engineer these features to achieve better performance.

Let's discuss some feature engineering techniques below!

First, let’s create a dummy social networking graph dataset with accounts and followers (which will also be accounts).

We create the two DataFrames shown below, an accounts DataFrame and a followers DataFrame.

Check this code👇

Read 14 tweets

Avi Chawla

@_avichawla

Jul 30

I have tested 100+ MCP servers in the last 3 months!

Let's use the best 6 to build an ultimate AI assistant for devs (100% local):

Today, we'll build a local ultimate AI assistant using:

- @mcpuse to connect LLM to MCP servers
- @Stagehanddev MCP for browser access
- @firecrawl_dev MCP for scraping
- @ragieai MCP for multimodal RAG
- @zep_ai Graphiti MCP as memory
- Terminal & GitIngest MCP

Let's dive in!

0️⃣ mcp-use

mcp-use is a fully open-source framework that lets you connect any LLM to any MCP server and build custom MCP Agents in 3 simple steps:

- Define the MCP server config.
- Build an Agent using the LLM & MCP client.
- Invoke the Agent.

Check this 👇

Read 12 tweets

Avi Chawla

@_avichawla

Jul 27

KV caching in LLMs, clearly explained (with visuals):

KV caching is a technique used to speed up LLM inference.

Before understanding the internal details, look at the inference speed difference in the video:

- with KV caching → 9 seconds
- without KV caching → 42 seconds (~5x slower)

Let's dive in!

To understand KV caching, we must know how LLMs output tokens.

- Transformer produces hidden states for all tokens.
- Hidden states are projected to the vocab space.
- Logits of the last token are used to generate the next token.
- Repeat for subsequent tokens.

Check this👇

Read 11 tweets

Avi Chawla

@_avichawla

Jul 26

5 levels of Agentic AI systems, clearly explained (with visuals):

Agentic AI systems don't just generate text; they can make decisions, call functions, and even run autonomous workflows.

The visual explains 5 levels of AI agency, starting from simple responders to fully autonomous agents.

Let's dive in to learn more!

1️⃣ Basic responder

- A human guides the entire flow.
- The LLM is just a generic responder that receives an input and produces an output. It has little control over the program flow.

See this visual👇

Read 9 tweets

Avi Chawla

@_avichawla

Jul 24

Let's compare Qwen 3 Coder & Sonnet 4 for code generation:

Qwen-3 Coder is Alibaba’s most powerful open-source coding LLM.

Today, let's build a pipeline to compare it to Sonnet 4 using:

- @LiteLLM for orchestration.
- @deepeval to build the eval pipeline (open-source).
- @OpenRouterAI to access @Alibaba_Qwen 3 Coder.

Let's dive in!

Here's the workflow:

- Ingest a GitHub repo and provide it as context to the LLMs.
- Generate code from both models using context + query.
- Compare the generated code using DeepEval.

Let’s implement this!

Read 15 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

Avi Chawla

Try unrolling a thread yourself!

More from @_avichawla

Avi Chawla

Avi Chawla

Avi Chawla

Avi Chawla

Avi Chawla

Avi Chawla

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!