Using #Python for content optimization in #SEO? You must be crazy, man.

And yet, there are some cool applications I will show you in this thread 🧵
Named entity recognition (NER). Extract named entities from a text to see what your competitors or Wikipedia are using for a given topic.

This is not about keywords but the co-occurrence of specific terms.
You can do that via Google NLP API or spaCy. The first can give you a measure of the importance of the entities, called salience. The higher, the most relevant for that text.

The second one has different perks and can be trained, meaning that you can make domain-specific models.
It is one of my favorite libraries because it is so versatile and relatively simple to use. spaCy is highly recommended for a lot of applications and almost all of them can somehow fit into SEO.
N-grams are contiguous sequences of words or letters (for instance). The n represents how many units you are going to take into account.

A bi-gram could be "Nike shoes", a trigram is "red Nike shoes".

Why are they so important then? >>>
>>> Frequent sentences in texts are interesting because it's extremely likely that phrase-based indexing is used by Google.

Simply put, Google uses sentences to understand related content to a certain topic. N-grams can get you these sentences in a text.

seobythesea.com/2011/12/10-mos…
The term phrase-based indexing is present across decades of patents. Google is working on it since at least 2004.

Source: gofishdigital.com/blog/are-you-u….
Nonetheless, n-grams are super useful for understanding the main combinations of words in a text. Not everything has to be a ranking factor or affect rankings directly.
For n-gram analysis, I suggest you this masterpiece of an app by @GregBernhardt4.

share.streamlit.io/dethfire/ngram…
POS tagging is another NLP task you can perform in spaCy. This is the foundation to build a Knowledge Graph later as well, as shown in this article.

holisticseo.digital/python-seo/inf…
The idea of the article above is that POS tagging can get you the dependencies of a sentence, which are used to build a KG.

I never use POS tagging alone, barring those cases where I may want to analyze some featured snippets. Again, this is not always the case.
Google Search Console data are the backbone of most of my analyses. You can use Google Search Console API to extract data and then do whatever you like.

You can find content that can be optimized with different methods.
insightAn example is provided here:

Google Colab file: colab.research.google.com/drive/1hXuKuf8…

Thread with the article:
Semantic clustering is another task that is invaluable nowadays. You don't need to start from scratch, there are services like @keywordinsights that do it for you.

This could probably be the standard in the next few years as optimizing for a single keyword is not meaningful.
Topic modeling. Can a machine understand the topic of a set of documents? Actually yes, there are some techniques for that.

You can take a competitor's website and see their most frequent topics to get an idea of what you should focus on. >>>
>>> Can a machine understand my content? You can have a rough estimate with topic modeling. Again, use this with caution, we are not looking for super-accurate results.

The interest is in getting good enough insights to make or influence a decision.
Use this Python library for topic modeling:

The problem of content optimization can be analyzed under 2 different perspectives. Which pages should be improved and what can I do with this knowledge.

Sometimes you have perfect pages and all you need is to wait.
Google Search Console data can tell you which pages need improvement. There are no clues about what you need to fix on the page itself.

You can crawl each URL and get the text tho.
Other tools do the opposite. They can tell you if you lack entities or what you can improve but they cannot provide info about CTR, for example.

That's where combining data comes in handy.
If you want to deploy a custom solution, you may consider having your script and getting to know how some libraries work.

I prefer to use both tools and my scripts depending on the scope of the project.

Sometimes you don't need to be super accurate.
Domain knowledge can even replace a part of Keyword Research in some cases. Tools can mislead you, that is why having a notebook or a topical map helps a lot.

In such cases, you can go with your scripts.
If you expected something about keyword density or the number of words you are completely wrong.

They are not important and if you have the objective of creating value for the user you won't even care about them.

Sure, you can have these metrics in Python but it's nonsense.
There are some solutions on the web for most of this stuff, so you don't even need to code from scratch.

If you want something larger, then you need to have your solution. Starters may feel more comfortable with prebuilt apps or scripts.
The next step is to create a pipeline, so a process where you extract data and get something as an output, like a visualization.

It's not necessary to have one but it's extremely beneficial for corporations and if you have the opportunity to.
Python can do the same things you are already doing in Excel and GSheets. I should do a thread next about the main differences and advantages for non-technical people.

The main difference for an SEO is that you get reproducibility and NLP libraries.
You can get the date when articles were published, the number of internal links, and much more from Google Analytics, Search Console, and Screaming Frog.

Merge these datasets and then apply conditional formatting to spot easily issues that need to be addressed.
I will write more about content audits as I am trying to build a general framework with the help of Python.

The possibilities are endless and it is my favorite topic.
The advantage of Python over other tools are several, the main one is that it is relatively simple to learn and it is super good for NLP.

You don't necessarily need to learn to code, there are tools on the market for that.

It's up to you.
The most important thing is to understand what types of content you need and the different stages of a possible funnel.

Content optimization needs to go through a sensible process of prioritization and logic.
You can build a lightweight scraper to get the headings of a given URL. This can be useful for getting the structure that your competitors may be using.

Creating your spider is beneficial for several reasons, like speed.
Speaking of which, product placement time! This article is another gem from WordLift.

wordlift.io/blog/en/web-sc…
Openai is an amazing Python library that can help you in creating outlines for articles. In my opinion, you can just use your brain for these types of tasks, but still!

openai.com/api/
I hope that the topic of content optimization with Python is now clearer for you.

If you liked this thread please consider liking and retweeting it! I am working on my personal website, it will contain the expanded version of my threads.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Marco Giordano 🇺🇦

Marco Giordano 🇺🇦 Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @GiordMarco96

Mar 11
[Case study]: How I got a publisher website past 400K sessions per month with Semantic #SEO and careful planning.

This is my longest thread so far and I will try to document all the steps I followed and the main takeaways. 🧵
The niche is pop culture (actually two subsets) and the market is Italy. Zero budget as it is a test project and I am just helping a friend of mine.

Everyone is writing and the most important skill, in this case, is knowing the industry.
The first thing I did was to do a technical audit back then to spot serious issues. Since I know the niche I can tell that it's not so important unless it's dragging you down.

The technical situation of the website wasn't that bad.
Read 47 tweets
Mar 10
I've talked about Natural Processing Language (#NLP) before. What is the difference with NLG and NLU?

Behind these terms lies something more important for #SEO Specialists.

I will explain you what are these strange acronyms in this thread 🧵
For the NLP definition, check my other thread on the topic. It is a clear and concise explanation on the subject.

Natural Language Generation (NLG) can be defined as the use of Artificial Intelligence to create content.

This is what tools like Jasper.ai do. They can generate texts according to your instructions and depending on how they are trained.
Read 30 tweets
Mar 9
The #SEO world is sadly filled with misinformation. One of the many reasons is that it is a non-academic subject.

A lot of case studies lack rigid methodology and solid proof.

This thread contains my personal considerations on SEO as a whole, considering what's good and bad 🧵
Learning SEO is a nightmare. Contrasting opinions, totally different niches and markets.

The main information sources have outdated info or are just repetitive.

The best solution is right here, this is the most complete offer out there:

learningseo.io
When I was student (literally months ago) we had some meetings with ""SEO experts"".

None of the stuff presented was SEO at all. The focus was on using one tool instead of teaching you the mindset or the basics.

Learn by yourself or have a good mentor.
Read 29 tweets
Mar 8
The importance of evergreen content in your #SEO strategy. Publishers may tend to focus too much on news because you know, popular topics = clicks.

This thread will explain you why it is not sustainable in the long term and some practical examples to change mindset 🧵
News content is time sensitive. You can have 5000 clicks today and zero tomorrow, you have to keep the ball running.

This may prove stressful enough in the long run and that is why balance is important.

Strategically leverage news to increase brand awareness and presence.
Evergreen refers to content that is not time sensitive and is not particularly subject to seasonality or timeframes.

Search demand is constant through years. They usually tackle recurring problems or super generic topics.

This is where you should apply Semantic SEO strategies.
Read 34 tweets
Mar 7
The top 10 of my best threads 🧵 for February 2022. A general recap in case you missed some of the stuff I posted.

Ideal for #SEO Specialists and anyone interested in learning #Python and #contentmarketing.
Important Python concepts to learn for SEO Specialists.

SEO tips coming from my personal experience with an online magazine.

Read 11 tweets
Mar 6
Some of the best things to do to step up your #SEO game imho.

This is a list of considerations that are not often discussed. We tend to focus more on hard skills rather than spending some time to understanding ourselves.

This is a thread based on my personal experience 🧵
Read more about patents and understand what happens behind the scenes.

This is necessary to understand why certain phenomena happen and how search engines could evolve in the next future.

A lot of advanced stuff is buried there.

As usual:

seobythesea.com
Test different things and challenge best practices. Don't think of SEO as an immutable process, it's more of a flow.

This is the problem I have with checklists. Treating every scenario as the same and lacking the flexibility to adapt to new contexts.

SEO is unpredictable.
Read 29 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(