Maarten Grootendorst Profile picture
🧑‍💻 Data Scientist | Psychologist 📖 Author of "Hands-On LLMs" (https://t.co/BcSDNMOnWq) 🧙‍♂️ Open Sourcerer (BERTopic, PolyFuzz, KeyBERT) 💡 Demystifying LLMs
3 subscribers
Feb 11 5 tweets 2 min read
Did you know we continue to develop new content for the "Hands-On Large Language Models" book?

There's now even a free course available with @DeepLearningAI! Image @JayAlammar and I are incredibly proud to bring you this highly animated (and free 😉) course: Image
Feb 3 4 tweets 1 min read
A Visual Guide to Reasoning LLMs 💭

With over 40 custom visuals, explore DeepSeek-R1, the train-time compute paradigm shift, test-time compute techniques, verifiers, STaR, and much more!

Link below Image From exploring verifiers for distilling reasoning: Image
May 31, 2023 7 tweets 4 min read
Multimodal, multi-aspect, Hugging Face Hub, safetensors, and more in BERTopic v0.15 🔥

Working together with @huggingface on this was a blast!

🤗Blog: huggingface.co/blog/bertopic
🤗Hub Example: huggingface.co/MaartenGr/BERT…
Changelog: maartengr.github.io/BERTopic/chang…

An update thread🧵 Image Apply textual topic modeling on images with the new update (🖼️+ 🖹 or 🖼️ only)!

Introducing a multi-modal Clip backend that embeds text and images.

Even when you have only images, you can caption the most representative images of each topic and extract textual representations! Image
Feb 14, 2023 8 tweets 4 min read
The v0.14 release of BERTopic is here 🥳 Fine-tune your topic keywords and labels with models from @OpenAI, @huggingface, @CohereAI, @spacy_io, and @LangChainAI.

Use models for part-of-speech tagging, text generation, zero-shot classification, and more!

An overview thread👇🧵 Use OpenAI's or Cohere's GPT models to suggest topic labels. For each topic, only a single API is needed, significantly reducing costs by focusing on representative documents and keywords. You can even perform prompt engineering by customizing the prompts.
Dec 28, 2022 7 tweets 3 min read
Final Preview: Outlier Reduction!

In the upcoming release of BERTopic, it will be possible to perform outlier reduction! Easily explore several strategies for outlier reduction after training your topic model. A flexible and modular approach!

A preview thread👇🧵 Strategy #1
The first strategy to reduce outliers is by making use of the soft-clustering capabilities of HDBSCAN. We find the best matching topic for each outlier document by looking at the topic-document probabilities generated by HDBSCAN.