Let's build a mini-ChatGPT that's powered by DeepSeek-R1 (100% local):
Here's a mini-ChatGPT app that runs locally on your computer. You can chat with it just like you would chat with ChatGPT.
We use:
- @DeepSeek_AI R1 as the LLM
- @Ollama to locally serve R1
- @chainlit_io for the UI
Let's build it!
We begin with the import statements and define the start_chat method.
It is invoked as soon as a new chat session starts.
Next, we define another method which will be invoked to generate a response from the LLM:
• The user inputs a prompt.
• We add it to the interaction history.
• We generate a response from the LLM.
• We store the LLM response in the interaction history.
Finally, we define the main method and run the app as follows:
Done!
This launches our 100% locally running mini-ChatGPT that is powered by DeepSeek-R1.
That's a wrap!
If you enjoyed this tutorial:
Find me → @_avichawla
Every day, I share tutorials and insights on DS, ML, LLMs, and RAGs.
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Enterprises build RAG over 100s of data sources, not one!
- Microsoft ships it in M365 products.
- Google ships it in its Vertex AI Search.
- AWS ships it in its Amazon Q Business.
Let's build an MCP-powered RAG over 200+ sources (100% local):
Enterprise data is scattered across many sources.
Today, we'll build a unified MCP server that can query 200+ sources from one interface.
Tech stack:
- @mcpuse to build a local MCP client
- @MindsDB to connect to data sources
- @ollama to serve GPT-oss locally
Let's begin!
Here's the workflow:
- User submits a query.
- Agent connects to the MindsDB MCP server to find tools.
- Selects the appropriate tool based on user's query and invokes it
- Finally, it returns a contextually relevant response
A simple technique makes RAG ~32x memory efficient!
- Perplexity uses it in its search index
- Azure uses it in its search pipeline
- HubSpot uses it in its AI assistant
Let's understand how to use it in RAG systems (with code):
Today, let's build a RAG system that queries 36M+ vectors in <30ms using Binary Quantization.
Tech stack:
- @llama_index for orchestration
- @milvusio as the vector DB
- @beam_cloud for serverless deployment
- @Kimi_Moonshot Kimi-K2 as the LLM hosted on Groq
Let's build it!
Here's the workflow:
- Ingest documents and generate binary embeddings.
- Create a binary vector index and store embeddings in the vector DB.
- Retrieve top-k similar documents to the user's query.
- LLM generates a response based on additional context.
- Google Maps uses graph ML to predict ETA
- Netflix uses graph ML (GNN) in recommendation
- Spotify uses graph ML (HGNNs) in recommendation
- Pinterest uses graph ML (PingSage) in recommendation
Here are 6 must-know ways for graph feature engineering (with code):
Like images, text, and tabular datasets have features, so do graph datasets.
This means when building models on graph datasets, we can engineer these features to achieve better performance.
Let's discuss some feature engineering techniques below!
First, let’s create a dummy social networking graph dataset with accounts and followers (which will also be accounts).
We create the two DataFrames shown below, an accounts DataFrame and a followers DataFrame.