Thread by @GiordMarco96 on Thread Reader App

One of the most useful #NLP libraries for #SEO in #Python is certainly BERTopic.

I will show you its benefits, why it's so powerful and simple to use in this thread 🧵

BERTopic is the easy and comfortable way of using advanced linguistics models without writing too much code.

That's why it's so powerful and reliable.

Although this library wasn't built with SEO in mind, it's clearly super versatile for us.

It's a way to flatten the steep learning curve that such topics possess.

We're focusing on the implementation itself rather than the theory. >>>

>>> This doesn't imply that you don't have to study the models! You should mature an understanding of the high level overview and the parameters.

It's very unlikely to have good results without tuning your models.

If you are like me and want to focus on NLP and Data Science this is the right way to go. Transformers and recent models are way better than older ones and are able to capture the semantic nature of words.

This is not possible with a traditional clustering technique.

Some terms you need to know are:

Embeddings - think of them as representing words in math language, i.e. vectors

Topic modelling - identifying topics in a set of documents

Transformers - Deep Learning models based on attention

>>>

>>> These are very broad definitions to get you started, do your research.

The idea here is to have the minimum level to get started with BERTopic.

You can find all you need in this link, just follow the instructions.

Yes, you can apply this idea to GSC data as well! I am working on it as well, it just takes time to properly clean data, as it is very hard in some niches.

maartengr.github.io/BERTopic/getti…

Visualizing topics is a great way to spot similarity among clusters. This is crucial for large websites or when you have no clue what a new domain is about.

Use this info as a hint on what to topically improve and to see your topical authority. >>>

>>> However, do recall that it's computationally expensive to process all those GSC data for medium websites, imagine for big ones!

There are plenty of topic modelling techniques and you have to get a basic understanding of transformers.

You have way too many options at first, just go through the docs and apply what you can. It will take time but it's totally worth it.

In alternative, you can check this Medium article by the author.
Here you get a "manual" implementation of some feature.

towardsdatascience.com/topic-modeling…

You can just use the library, use the article in the tweet above for practice.

If you're not happy with the models available, you can always load yours.

BERTopic is an excellent way to carry out complex tasks without writing tons of code. Give it a try on Google Colab, totally worth it!

For the pros:

- Variety of models + custom
- Short and neat code
- Plenty of options

This library has no particular cons, they're mostly related to the algorithms:

- often computationally expensive without a proper setup
- using models without tuning is a waste of time

Again, this is not a problem of the library. Studying models is your task!

BERTopic works well with spaCy too, one of the best and most famous NLP libraries in Python.

Python is a good compromise to create your own tools to validate ideas and automate boring stuff.

This library improves your workflow by adding semantic clustering and the possibility to evaluate content networks (given proper tuning).

Data cleaning is the most important step, be sure to take out the trash!

Using this for keyword research is a great idea too, sometimes you can switch to querycat for association rule learning.

The reason is that you don't always need to rely on semantics, don't overcomplicate simple tasks.

I am not the biggest fan of traditional clustering techniques for keywords.

I usually go with either querycat or transformers. As mentioned before, the latter can be super slow with some models and for some datasets.

Be sure to filter out useless keywords!

You have to play a little bit with how many topics you want and the n-grams.

Some models may perform better in certain scenarios, study what's more suitable and practice.

You don't have the burden of writing a lot of code tho

This thread is not supposed to be super technical, although I don't exclude that I will write a long article on the topic in the next future.

Share this Scrolly Tale with your friends.

A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.

Share this page!

Enter URL or ID to Unroll