@huggingface engineer. I'm the reason your LLM frontend has a jinja2cpp dependency. Sometimes yells about housing and trans rights instead of working
He/him
7 subscribers
Jan 28 • 22 tweets • 6 min read
Complete hardware + software setup for running Deepseek-R1 locally. The actual model, no distillations, and Q8 quantization for full quality. Total cost, $6,000. All download and part links below:
Motherboard: Gigabyte MZ73-LM0 or MZ73-LM1. We want 2 EPYC sockets to get a massive 24 channels of DDR5 RAM to max out that memory size and bandwidth.
An elegant idea I got from a @GoogleDeepMind paper years back: When doing continuous-valued regression with a neural net, don't have a single neuron output estimating the value. Instead, have a layer of neurons outputting the mean/SD/weight of gaussians. 🧵
This gives you a much richer output, and a much cleaner loss: You simply add and normalize the gaussians, and compute the loss from the probability assigned to the label value. Cross-entropy for regression tasks!
Aug 12, 2024 • 11 tweets • 4 min read
Big announcement today @huggingface: We now have a unified API for tool use across models from @MistralAI, @AIatMeta, @cohere, @NousResearch and more!
That means that you can reuse the same simplified, portable code to add tool capabilities to all of those models! 🧵
Tool use with LLMs is one of those things that's simple in theory but surprisingly complex in practice. When the model calls a tool, how do you know? How do you add it to the chat? Did you know some models expected tool defs in JSON schema and others expected Python headers?
Jun 21, 2024 • 19 tweets • 5 min read
Good morning. At some point this summer, perhaps quite soon, @AIatMeta will be releasing a LLaMA-3 model with 400B parameters. It will likely be the strongest open-source LLM ever released by a wide margin.
This is a thread about how to run it locally. 🧵
First up, the basics: You can quantize models to about ~6bits per parameter before performance degrades. We don't want performance to degrade, so 6 bits it is. This means the model will be (6/8) * 400B = 300GB.
Apr 9, 2024 • 9 tweets • 3 min read
Alright, strap in. Support for Command-R+ was merged into llama.cpp exactly 4 hours ago. We're going to start talking to a GPT-4 level model on local hardware without a GPU. If you have 64GB of RAM, feel free to follow along 🧵
First up, a note about hardware: Text generation is limited by memory bandwidth. This will run on any machine with 64GB or more, but if you want speed I recommend DDR5, ideally on an 8 or even 12-channel motherboard, like Xeon/Epyc/Threadripper Pro/Apple silicon.
Jun 29, 2022 • 6 tweets • 2 min read
We're exploring end-to-end NLP TensorFlow models in 🤗Transformers! We've got a quick gist here if you want to get started, or you can read on for more. 🧵 gist.github.com/Rocketknight1/…
Firstly, what's going on here? Briefly, we've integrated TensorFlow Text with 🤗Transformers, so that you can easily get a TF tokenizer that matches your model checkpoint. This works for any checkpoint, even one you trained! (Only BERT-based for now, but that will change)
Jun 10, 2022 • 8 tweets • 2 min read
There's a fully functional protein design space on HuggingFace now, which would have felt like outrageous science fiction even 18 months ago. I'm going to try to explain what the incredible potential here is. 🧵
huggingface.co/spaces/simondu…
Proteins are long chains of simple chemicals called amino acids that fold up into complex 3D shapes. Different amino acids affect the structure in different ways - some stick to each other, some repel, some force bends into the chain.