Tweet

Marco Giordano

Sep 20 • 26 tweets • 8 min read

A list of the most useful #Python libraries you can use for #SEO right now. 🐍

This updated thread will tell you the main libraries for #DataScience and #NLP that you should consider.

Use them in your workflow! 🧵

Numpy & Pandas: the foundations for data analysis, just learn them.

Without these 2 libraries, you cannot do Data Science at all. Good knowledge of Pandas can get you quite far.

Advertools: the best SEM library out there.

It’s very useful for crawling, log file analysis, analyzing SERPs and querying the Knowledge Graph.

The ideal Swiss knife you need in your arsenal.

advertools.readthedocs.io/en/master/

Ecommercetools: The ideal package for analyzing eCommerce data and getting access to some useful NLP functions.

It’s a rare jewel in your collection that is very handy for technical SEO and e-commerce as well.

pypi.org/project/ecomme…

Requests: Make HTTPS requests via Python, essential for web scraping.

Sure, there are alternatives but you should learn them. It's very important and a lot of your initial work will require this library.

pypi.org/project/reques…

urllibb: for working with URLs. It should be part of your arsenal.

Take some time to study all the options and possible use cases.

docs.python.org/3/library/urll…

BeautifulSoup: a library to extract data from HTML/XML files, used in combination with scraping libraries to convert data into Python objects.

One of the first ones you’ll probably learn in your Python journey.

crummy.com/software/Beaut…

Scrapy: the absolute peak of scraping.

Nothing is better than this, even though the setup may be hard.

You can carry out any scraping task with this library.

Matplotlib/Seaborn/Plotly: you need some sort of visualization and these libs are here to help you.

You can start with Seaborn which is easier to use. DataViz is an important topic and you should value it.

NLTK/spaCy: work with human language to analyze text data and get insights into the nuances of our language.

This is necessary to get your hands dirty with text data.

The latter can be used to recognize entities and parts of speech.

Querycat: few functions but good quality thanks to association rule mining and BERT.

It's one of my favorite libraries, but the installation may not be immediate.

It's useful for visualizing losses in impressions over time.

github.com/jroakes/queryc…

Sklearn: A staple for Machine Learning.

I don't think you really need it, but it's one of the first libs you will encounter.

scikit-learn.org/stable/

https://twitter.com/GiordMarco96/status/1504910522608619521

Transformers: Pretrained models to handle a wide range of tasks. Essential for NLP!

This library is crucial for the most advanced tasks and quite reliable too. I highly suggest you check my other thread:

https://twitter.com/GiordMarco96/status/1504910522608619521

sentence_transformers: Python framework for state-of-the-art sentence, text, and image embeddings.

Use it for keyword clustering and other text-related tasks. It's one of my most used libraries right now.

sbert.net

Trafilatura: download, parse and scrape web pages.

If you work with content, look no further.

Cleaning the HTML elements of a page is overrated, don't waste your life on it!

trafilatura.readthedocs.io/en/latest/

Streamlit/Dash: interactive web applications.

Useful for prototyping and communicating.

Streamlit is one of the most favorite solutions for the SEO community.

Typer: create apps that you can run from your command line.

Extremely powerful for personal uses and for running local scripts.

A game-changer for automating your workflow.

typer.tiangolo.com

networkx: the must-have graph theory library.

I recommend you learn it once you have mastered the basics.

Graph Theory is of great importance for analysts who want to level up their game.

More on this in future threads.

networkx.org

searchconsole: Use this library to import your data from the GSC API.

It's easy to set up and it's one of the most used libraries in my workflow.

github.com/joshcarty/goog…

https://twitter.com/GiordMarco96/status/1492289867908366336

BERTopic: one of my most used NLP libraries and for good reasons. I dedicated an entire thread on the topic:

https://twitter.com/GiordMarco96/status/1492289867908366336

scattertext: library for finding distinguishing terms in corpora and displaying them in an interactive HTML scatter plot.

A short example from the official docs: (github.com/JasonKessler/s…).

openpyxl: if you have to work with Excel data and create spreadsheets.

There are other libraries but I prefer to use this one. It's quite nice and it works well for most of the tasks.

openpyxl.readthedocs.io/en/stable/

Start with scraping and data analysis.

Then, you can move to NLP libraries and study topics like NER and Clustering.

Sticking to the mainstream libraries is necessary to get access to "better" documentation.

My suggestion is to try alternatives and always look for new opportunities across the web.

Be sure to always do your research, you could find the perfect library for your needs.

Follow me for threads, tips, and case studies (coming soon) about SEO, content, and Python/data.

If you liked this thread, consider liking and retweeting it!🧵

I offer short consultancies and full freelancing for publishers and B2C content.

bookk.me/marcogiordano

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

Read 19 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Separate emails with commas Message

Share this page!

Marco Giordano

People who liked this thread also liked...

Try unrolling a thread yourself!

More from @GiordMarco96

Marco Giordano

Marco Giordano

Marco Giordano

Marco Giordano

Marco Giordano

Marco Giordano

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!