Avi Chawla Profile picture
Jan 25 7 tweets 2 min read Read on X
Let's build a mini-ChatGPT that's powered by DeepSeek-R1 (100% local):
Here's a mini-ChatGPT app that runs locally on your computer. You can chat with it just like you would chat with ChatGPT.

We use:
- @DeepSeek_AI R1 as the LLM
- @Ollama to locally serve R1
- @chainlit_io for the UI

Let's build it!
We begin with the import statements and define the start_chat method.

It is invoked as soon as a new chat session starts. Image
Next, we define another method which will be invoked to generate a response from the LLM:

• The user inputs a prompt.
• We add it to the interaction history.
• We generate a response from the LLM.
• We store the LLM response in the interaction history. Image
Finally, we define the main method and run the app as follows: Image
Done!

This launches our 100% locally running mini-ChatGPT that is powered by DeepSeek-R1.
That's a wrap!

If you enjoyed this tutorial:

Find me → @_avichawla

Every day, I share tutorials and insights on DS, ML, LLMs, and RAGs.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Avi Chawla

Avi Chawla Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @_avichawla

Sep 11
Let's build a context engineering workflow, step by step:
Today, we'll build a multi-agent research assistant using context engineering principles.

Tech stack:
- @tensorlake to get RAG-ready data from complex docs
- @zep_ai for memory
- @firecrawl_dev for web search
- @milvusio for vector DB
- @crewAIInc for orchestration

Let's go!
First, what is context engineering (CE)?

LLMs aren't mind readers. They can only work with what you give them.

Prompt engineering primarily focuses on “magic words” with an expectation of getting a better response.

CE involves creating dynamic systems that offer:
- The right info
- The right tools
- In the right format

This ensures the LLM can effectively complete the task.
Read 15 tweets
Sep 8
I have been fine-tuning LLMs for over two years now!

Here are the top 5 LLM fine-tuning techniques, explained visually:
Traditional fine‑tuning is impractical for LLMs (billions of params; 100s GB).

Since this kind of computing isn't accessible to everyone, parameter-efficient finetuning (PEFT) is extensively used.

Today, we’ll cover the top 5 PEFT techniques, step by step. Image
Some background!

LLM weights are matrices of numbers adjusted during finetuning.

Most PEFT techniques involve finding a lower-rank adaptation of these matrices, which is a smaller-dimensional matrix that can still represent the information stored in the original. Image
Read 11 tweets
Sep 6
Let's generate our own LLM fine-tuning dataset (100% local):
Before we begin, here's what we're doing today!

We'll cover:
- What is instruction fine-tuning?
- Why is it important for LLMs?

Finally, we'll create our own instruction fine-tuning dataset.

Let's dive in! Image
Once an LLM has been pre-trained, it simply continues the sentence as if it is one long text in a book or an article.

For instance, check this to understand how a pre-trained LLM behaves when prompted 👇 Image
Read 13 tweets
Sep 4
7 LLM generation parameters, clearly explained (with visuals):
Every generation from an LLM is shaped by parameters under the hood.

Knowing how to tune is important so that you can produce sharp and more controlled outputs.

The visual shows 7 parameters that matter most.

Let's understand them one by one! Image
1️⃣ Max tokens

This is a hard cap on how many tokens the model can generate in one response.

- Too low → truncated outputs
- Too high → could lead to wasted compute.

Check this 👇
Read 11 tweets
Aug 28
Temperature in LLMs, clearly explained (with code):
Let's prompt OpenAI GPT-3.5 with a low temperature value twice.

It produces identical responses from the LLM.

Check the response below👇 Image
Now, let's prompt it with a high temperature value.

This time, it produces gibberish output. Check the output below.

What is going on here? Let's dive in! Image
Read 9 tweets
Aug 27
There's a new way to build production-grade MCP servers.

- It takes less than a minute.
- You don't have to write any code.
- You can integrate from 100k+ tools.

Here's a step-by-step breakdown (100% local):
To build MCP servers from scratch with custom tools, one has to:

- read the API docs
- implement MCP tools
- test them, and much more

Today, let's learn how to simplify this and build production-grade MCP servers using Postman's MCP Generator (free to use).

Let's dive in!
For context...

Postman's MCP Generator lets us build an MCP server with tools from its public API Network (with 100k+ APIs).

Steps:

- Select all the APIs for your MCP server.
- Export the code for the MCP server.
- Integrate it with any MCP client.

Check this👇 Image
Read 11 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(