Temperature in LLMs, clearly explained (with code):
Let's prompt OpenAI GPT-3.5 with a low temperature value twice.
It produces identical responses from the LLM.
Check the response below👇
Now, let's prompt it with a high temperature value.
This time, it produces a gibberish output. Check the output below👇
What is going on here? Let's dive in!
Text-generating LLMs are like classification models whose output layer spans the entire vocabulary.
However, instead of selecting the best token, they "sample" the prediction.
So even if “Token 1” has the highest softmax score, it may not be chosen due to sampling👇
The impact of sampling is controlled using the Temperature parameter.
Temperature introduces the following tweak in the softmax function 👇
If the temperature is low, the probabilities look like a max value instead of a “soft-max” value.
This means the sampling process will almost certainly choose the token with the highest probability. This makes the generation process (nearly) greedy.
Check this👇
If the temperature is high, the probabilities start to look like a uniform distribution:
This means the sampling process may select any token. This makes the generation process random and heavily stochastic, like we saw earlier.
Check this👇
Some best practices for using temperature (T):
- Set a low T value to generate predictable responses.
- Set a high T value to generate more random and creative responses.
- An extremely high T value rarely has any real utility, as shown below👇
That's a wrap!
If you enjoyed this tutorial:
Find me → @_avichawla
Every day, I share tutorials and insights on DS, ML, LLMs, and RAGs.
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Let's build a context engineering workflow, step by step:
Today, we'll build a multi-agent research assistant using context engineering principles.
Tech stack:
- @tensorlake to get RAG-ready data from complex docs
- @zep_ai for memory
- @firecrawl_dev for web search
- @milvusio for vector DB
- @crewAIInc for orchestration
Let's go!
First, what is context engineering (CE)?
LLMs aren't mind readers. They can only work with what you give them.
Prompt engineering primarily focuses on “magic words” with an expectation of getting a better response.
CE involves creating dynamic systems that offer:
- The right info
- The right tools
- In the right format
This ensures the LLM can effectively complete the task.
I have been fine-tuning LLMs for over two years now!
Here are the top 5 LLM fine-tuning techniques, explained visually:
Traditional fine‑tuning is impractical for LLMs (billions of params; 100s GB).
Since this kind of computing isn't accessible to everyone, parameter-efficient finetuning (PEFT) is extensively used.
Today, we’ll cover the top 5 PEFT techniques, step by step.
Some background!
LLM weights are matrices of numbers adjusted during finetuning.
Most PEFT techniques involve finding a lower-rank adaptation of these matrices, which is a smaller-dimensional matrix that can still represent the information stored in the original.