Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Maarten Grootendorst

@MaartenGr

Feb 14, 2023 • 8 tweets • 4 min read • Read on X

Scrolly

@OpenAI

The v0.14 release of BERTopic is here 🥳 Fine-tune your topic keywords and labels with models from @OpenAI, @huggingface, @CohereAI, @spacy_io, and @LangChainAI.

Use models for part-of-speech tagging, text generation, zero-shot classification, and more!

An overview thread👇🧵

Use OpenAI's or Cohere's GPT models to suggest topic labels. For each topic, only a single API is needed, significantly reducing costs by focusing on representative documents and keywords. You can even perform prompt engineering by customizing the prompts.

Use a KeyBERT-inspired model to further fine-tune the topic keywords. It makes use of c-TF-IDF to generate candidate keywords and representative documents from which to extract the improved topic keywords. It borrows many ideas from KeyBERT but optimizes it for topic generation.

Apply POS tagging with Spacy to improve the topic keywords. We leverage c-TF-IDF to perform POS tagging on a subset of representative keywords and documents. Customize the POS patterns that you are interested in to optimize the extracted keywords.

@huggingface

Use publicly-available text-generation models with @huggingface! We documents and keywords that are representative of a topic to these models and ask them to generate topic labels. Customization of your prompts can have a huge influence on the output.

Diversify the topic keywords with MaximalMarginalRelevance. Although it was already implemented in BERTopic, I felt like it deserved to have its own representation model. It is a quick way to improve the generated keywords!

We can even chain different models to sequentially fine-tune the topic keywords and/or labels. Here, we first use a KeyBERT-inspired model to create our topics and then diversify the output with MMR. Chain as many models as you want!

And that is not it! We can perform zero-shot classification on the topic labels, apply LangChain for more LLM customization, and have fun with prompt engineering. Learn more about BERTopic and the new models here: maartengr.github.io/BERTopic/chang…

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @MaartenGr

Maarten Grootendorst

@MaartenGr

Feb 11

Did you know we continue to develop new content for the "Hands-On Large Language Models" book?

There's now even a free course available with @DeepLearningAI!

@JayAlammar and I are incredibly proud to bring you this highly animated (and free 😉) course:

There are also guides to common principles like Quantization and Mixture of Experts:

Read 5 tweets

Maarten Grootendorst

@MaartenGr

Feb 3

A Visual Guide to Reasoning LLMs 💭

With over 40 custom visuals, explore DeepSeek-R1, the train-time compute paradigm shift, test-time compute techniques, verifiers, STaR, and much more!

Link below

From exploring verifiers for distilling reasoning:

All the way to DeepSeek-R1-(zero):

Read 4 tweets

Maarten Grootendorst

@MaartenGr

May 31, 2023

@huggingface

Multimodal, multi-aspect, Hugging Face Hub, safetensors, and more in BERTopic v0.15 🔥

Working together with @huggingface on this was a blast!

🤗Blog: huggingface.co/blog/bertopic
🤗Hub Example: huggingface.co/MaartenGr/BERT…
Changelog: maartengr.github.io/BERTopic/chang…

An update thread🧵

Apply textual topic modeling on images with the new update (🖼️+ 🖹 or 🖼️ only)!

Introducing a multi-modal Clip backend that embeds text and images.

Even when you have only images, you can caption the most representative images of each topic and extract textual representations!

@huggingface

Easily share your topic modeling on the Hugging Face Hub! After a great collaboration with @huggingface and inspired by the work at github.com/opinionscience…, you can now load and share pre-trained BERTopic models from the 🤗Hub.

Try it out yourself:

huggingface.co/MaartenGr/BERT…

Read 7 tweets

Maarten Grootendorst

@MaartenGr

Dec 28, 2022

Final Preview: Outlier Reduction!

In the upcoming release of BERTopic, it will be possible to perform outlier reduction! Easily explore several strategies for outlier reduction after training your topic model. A flexible and modular approach!

A preview thread👇🧵

Strategy #1
The first strategy to reduce outliers is by making use of the soft-clustering capabilities of HDBSCAN. We find the best matching topic for each outlier document by looking at the topic-document probabilities generated by HDBSCAN.

Strategy #2
The newly added `.approximate_distribution` allows us to generate topic distributions for each document, even outlier documents. As such, we can use those topic distributions to assign outlier documents to non-outlier topics.

Read 7 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

Maarten Grootendorst

Try unrolling a thread yourself!

More from @MaartenGr

Maarten Grootendorst

Maarten Grootendorst

Maarten Grootendorst

Maarten Grootendorst

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!