Let's build a mini-ChatGPT that's powered by DeepSeek-R1 (100% local):
Here's a mini-ChatGPT app that runs locally on your computer. You can chat with it just like you would chat with ChatGPT.
We use:
- @DeepSeek_AI R1 as the LLM
- @Ollama to locally serve R1
- @chainlit_io for the UI
Let's build it!
We begin with the import statements and define the start_chat method.
It is invoked as soon as a new chat session starts.
Next, we define another method which will be invoked to generate a response from the LLM:
• The user inputs a prompt.
• We add it to the interaction history.
• We generate a response from the LLM.
• We store the LLM response in the interaction history.
Finally, we define the main method and run the app as follows:
Done!
This launches our 100% locally running mini-ChatGPT that is powered by DeepSeek-R1.
That's a wrap!
If you enjoyed this tutorial:
Find me → @_avichawla
Every day, I share tutorials and insights on DS, ML, LLMs, and RAGs.
• • •
Missing some Tweet in this thread? You can try to
force a refresh
You're in a Research Scientist interview at OpenAI.
The interviewer asks:
"How would you expand the context length of an LLM from 2K to 128K tokens?"
You: "I will fine-tune the model on longer docs with 128K context."
Interview over.
Here's what you missed:
Extending the context window isn't just about larger matrices.
In a traditional transformer, expanding tokens by 8x increases memory needs by 64x due to the quadratic complexity of attention. Refer to the image below!
So, how do we manage it?
continue...👇
1) Sparse Attention
It limits the attention computation to a subset of tokens by:
- Using local attention (tokens attend only to their neighbors).
- Letting the model learn which tokens to focus on.
But this has a trade-off between computational complexity and performance.
Let's build a reasoning LLM using GRPO, from scratch (100% local):
Today, we're going to learn how to turn any model into a reasoning powerhouse.
We'll do so without any labeled data or human intervention, using Reinforcement Finetuning (GRPO)!
Tech stack:
- @UnslothAI for efficient fine-tuning
- @HuggingFace TRL to apply GRPO
Let's go! 🚀
What is GRPO?
Group Relative Policy Optimization is a reinforcement learning method that fine-tunes LLMs for math and reasoning tasks using deterministic reward functions, eliminating the need for labeled data.
Here's a brief overview of GRPO before we jump into code:
Let's build a multi-agent brand monitoring system (100% local):
Today, we're building a brand monitoring app to gain local-to-global insights from campaign and product feedback.
Tech stack:
- Bright Data to scrape data at scale
- @beam_cloud for deployment
- @crewAIInc for orchestration
- @ollama to serve LLM locally
Let's go!
Here's the workflow:
- Enter brand, region, and language
- Use Bright Data to gather social media mentions
- Analyze and generate insights with platform-specific agents
- Combine insights for final report