Using #Python for content optimization in #SEO? You must be crazy, man.
And yet, there are some cool applications I will show you in this thread 🧵
Named entity recognition (NER). Extract named entities from a text to see what your competitors or Wikipedia are using for a given topic.
This is not about keywords but the co-occurrence of specific terms.
You can do that via Google NLP API or spaCy. The first can give you a measure of the importance of the entities, called salience. The higher, the most relevant for that text.
The second one has different perks and can be trained, meaning that you can make domain-specific models.
It is one of my favorite libraries because it is so versatile and relatively simple to use. spaCy is highly recommended for a lot of applications and almost all of them can somehow fit into SEO.
N-grams are contiguous sequences of words or letters (for instance). The n represents how many units you are going to take into account.
A bi-gram could be "Nike shoes", a trigram is "red Nike shoes".
Why are they so important then? >>>
>>> Frequent sentences in texts are interesting because it's extremely likely that phrase-based indexing is used by Google.
Simply put, Google uses sentences to understand related content to a certain topic. N-grams can get you these sentences in a text.
Nonetheless, n-grams are super useful for understanding the main combinations of words in a text. Not everything has to be a ranking factor or affect rankings directly.
For n-gram analysis, I suggest you this masterpiece of an app by @GregBernhardt4.
The idea of the article above is that POS tagging can get you the dependencies of a sentence, which are used to build a KG.
I never use POS tagging alone, barring those cases where I may want to analyze some featured snippets. Again, this is not always the case.
Google Search Console data are the backbone of most of my analyses. You can use Google Search Console API to extract data and then do whatever you like.
You can find content that can be optimized with different methods.
Semantic clustering is another task that is invaluable nowadays. You don't need to start from scratch, there are services like @keywordinsights that do it for you.
This could probably be the standard in the next few years as optimizing for a single keyword is not meaningful.
Topic modeling. Can a machine understand the topic of a set of documents? Actually yes, there are some techniques for that.
You can take a competitor's website and see their most frequent topics to get an idea of what you should focus on. >>>
>>> Can a machine understand my content? You can have a rough estimate with topic modeling. Again, use this with caution, we are not looking for super-accurate results.
The interest is in getting good enough insights to make or influence a decision.
The problem of content optimization can be analyzed under 2 different perspectives. Which pages should be improved and what can I do with this knowledge.
Sometimes you have perfect pages and all you need is to wait.
Google Search Console data can tell you which pages need improvement. There are no clues about what you need to fix on the page itself.
You can crawl each URL and get the text tho.
Other tools do the opposite. They can tell you if you lack entities or what you can improve but they cannot provide info about CTR, for example.
That's where combining data comes in handy.
If you want to deploy a custom solution, you may consider having your script and getting to know how some libraries work.
I prefer to use both tools and my scripts depending on the scope of the project.
Sometimes you don't need to be super accurate.
Domain knowledge can even replace a part of Keyword Research in some cases. Tools can mislead you, that is why having a notebook or a topical map helps a lot.
In such cases, you can go with your scripts.
If you expected something about keyword density or the number of words you are completely wrong.
They are not important and if you have the objective of creating value for the user you won't even care about them.
Sure, you can have these metrics in Python but it's nonsense.
There are some solutions on the web for most of this stuff, so you don't even need to code from scratch.
If you want something larger, then you need to have your solution. Starters may feel more comfortable with prebuilt apps or scripts.
The next step is to create a pipeline, so a process where you extract data and get something as an output, like a visualization.
It's not necessary to have one but it's extremely beneficial for corporations and if you have the opportunity to.
Python can do the same things you are already doing in Excel and GSheets. I should do a thread next about the main differences and advantages for non-technical people.
The main difference for an SEO is that you get reproducibility and NLP libraries.
You can get the date when articles were published, the number of internal links, and much more from Google Analytics, Search Console, and Screaming Frog.
Merge these datasets and then apply conditional formatting to spot easily issues that need to be addressed.
I will write more about content audits as I am trying to build a general framework with the help of Python.
The possibilities are endless and it is my favorite topic.
The advantage of Python over other tools are several, the main one is that it is relatively simple to learn and it is super good for NLP.
You don't necessarily need to learn to code, there are tools on the market for that.
It's up to you.
The most important thing is to understand what types of content you need and the different stages of a possible funnel.
Content optimization needs to go through a sensible process of prioritization and logic.
You can build a lightweight scraper to get the headings of a given URL. This can be useful for getting the structure that your competitors may be using.
Creating your spider is beneficial for several reasons, like speed.
Speaking of which, product placement time! This article is another gem from WordLift.
Openai is an amazing Python library that can help you in creating outlines for articles. In my opinion, you can just use your brain for these types of tasks, but still!
I hope that the topic of content optimization with Python is now clearer for you.
If you liked this thread please consider liking and retweeting it! I am working on my personal website, it will contain the expanded version of my threads.
• • •
Missing some Tweet in this thread? You can try to
force a refresh
[Case study]: How I got a publisher website past 400K sessions per month with Semantic #SEO and careful planning.
This is my longest thread so far and I will try to document all the steps I followed and the main takeaways. 🧵
The niche is pop culture (actually two subsets) and the market is Italy. Zero budget as it is a test project and I am just helping a friend of mine.
Everyone is writing and the most important skill, in this case, is knowing the industry.
The first thing I did was to do a technical audit back then to spot serious issues. Since I know the niche I can tell that it's not so important unless it's dragging you down.
The technical situation of the website wasn't that bad.
Some of the best things to do to step up your #SEO game imho.
This is a list of considerations that are not often discussed. We tend to focus more on hard skills rather than spending some time to understanding ourselves.
This is a thread based on my personal experience 🧵
Read more about patents and understand what happens behind the scenes.
This is necessary to understand why certain phenomena happen and how search engines could evolve in the next future.