🧑💻 Data Scientist | Psychologist
📖 Author of "Hands-On LLMs" (https://t.co/BcSDNMOnWq)
🧙♂️ Open Sourcerer (BERTopic, PolyFuzz, KeyBERT)
💡 Demystifying LLMs
3 subscribers
Feb 11 • 5 tweets • 2 min read
Did you know we continue to develop new content for the "Hands-On Large Language Models" book?
There's now even a free course available with @DeepLearningAI!
@JayAlammar and I are incredibly proud to bring you this highly animated (and free 😉) course:
Feb 3 • 4 tweets • 1 min read
A Visual Guide to Reasoning LLMs 💭
With over 40 custom visuals, explore DeepSeek-R1, the train-time compute paradigm shift, test-time compute techniques, verifiers, STaR, and much more!
Link below
From exploring verifiers for distilling reasoning:
May 31, 2023 • 7 tweets • 4 min read
Multimodal, multi-aspect, Hugging Face Hub, safetensors, and more in BERTopic v0.15 🔥
Working together with @huggingface on this was a blast!
Use models for part-of-speech tagging, text generation, zero-shot classification, and more!
An overview thread👇🧵
Use OpenAI's or Cohere's GPT models to suggest topic labels. For each topic, only a single API is needed, significantly reducing costs by focusing on representative documents and keywords. You can even perform prompt engineering by customizing the prompts.
Dec 28, 2022 • 7 tweets • 3 min read
Final Preview: Outlier Reduction!
In the upcoming release of BERTopic, it will be possible to perform outlier reduction! Easily explore several strategies for outlier reduction after training your topic model. A flexible and modular approach!
A preview thread👇🧵
Strategy #1
The first strategy to reduce outliers is by making use of the soft-clustering capabilities of HDBSCAN. We find the best matching topic for each outlier document by looking at the topic-document probabilities generated by HDBSCAN.