If you want to learn to do some #technicalseo with one of the best crawling tools out there, here is a 🧵for you.
.@screamingfrog is a crawler that you download and install onto a local or virtual machine. It allows you to crawl almost any website.
I use it as part of my technical seo audit and their customer service is 🔥
However, the initial learning curve is steep.
If you've looked at a finished @screamingfrog crawl and felt overwhelmed - you're NOT alone.
I, and many others, were in your EXACT shoes.
It's ok
You've got this💪🏽
⏩ Here a 9 tips to make your life with Screaming Frog easier, more enjoyable, and F-U-N👇
1. Run in database storage mode
Configuration > System > Storage mode
Doing so will save your crawl data automatically to your computer and allows you to compare between crawls of the same site.
2. You don't need to crawl an entire site
Crawling an entire website can take hours (sometimes days).
🧠You don't *always* need to have full information to make recommendations. Often, a sample is all you need to ID symptoms and make relevant recommendations.
3. Add sitemap address to get sitemap insights
Because many robots.txt files do not reference the sitemap index URL you will have to add it yourself👇
4. Run crawl analysis to see sitemap issues such as:
⚠️ orphaned URLs
⚠️ URLs not in sitemap
⚠️ non-indexable URLs in sitemap
Once your crawl has finished, go to Crawl Analysis > Start then click on the Sitemaps tab and the right hand side panel will display sitemap stats👇
💡FYI, you can run a crawl analysis at any stage of the crawl - you do not have to wait to crawl the entire site.
This is great for crawling large sites and you want to get a sense if there are issues with the its sitemap.
Click Pause > Crawl Analysis > Start👇
💡A website's sitemap is one of the first things I look at.
🧠If I see many non-indexable URLs in the sitemap or many orphaned URLs, this is a strong signal that there are a host of technical issues and warrants a full audit.
5. Crawl a *specific* subfolder of a site (and nothing else)
Eg, I want to crawl all the URLs within /blog/ subfolder of a website.
To do this, go to Configuration > Include then put in https://domain/blog/.*
⚠️Don't forget the ‼️.*‼️
6. On your first crawl, crawl HTML files only
By default, @screamingfrog will crawl all image, CSS, JS and SWF files.
💡Uncheck these for your initial crawl to get a quick sense of the URLs and will save you some time.
7. Run crawler in Javascript mode
Many websites are single-page applications or rely on JS to render ⚡️important⚡️ content. Running in JS mode in @screamingfrog will give you a proxy of how well (or not) Google can crawl and render any website.
When crawling in JS mode:
• @screamingfrog takes a lot longer (so be patient)
• go to Rendered Page to see if all important content can be rendered
It *literally* took me 3.5 years to discover the Rendered Page tab (👏@myriamjessier for showing me👏).
This is where you'll find the rendered view when you run a JS crawl in @screamingfrog 👇
💡If you don't see the body content shown in the Rendered Page panel, this is probably a strong signal that Google will have issues rendering text and internal links on the website.
💡If this happens, corroborate with Google Search Console data.
Since @screamingfrog takes⏳to crawl in JS mode, TameTheBots by @davewsmart is ⚡️ah-mazing⚡️ for quick diagnosis on a page-by-page basis.
IE, if important content is not displayed - this indicates a potential issue with rendering, indexing and ranking.
💡Most sites will have URLs that 301/302 redirect or have pages with 404 response codes.
After a crawl go to the Response Codes tab and filter URLs in ascending/descending order. This will show you all the URLs that are non-200👇
💡A really quick way to see this is by running as little extractions as possible.
You can do this by unchecking all "Resource Links" in Crawl Configuration and unchecking everything in Page Details in "Extraction".
9. Find low-content pages
URLs with less than 200 words often lack depth and may contribute to poor indexing of URLs. Luckily @screamingfrog can easily show you these URLs.
💡To find these run a Crawl Analysis, go to Content Tab then look for Low Content Pages.
💡 Not *all* pages with less than 200 words = bad. You'll have to use your own judgement to decide if the ones @screamingfrog has shown you are appropriate or not.
Old mate @KorayGubur dropped a knowledge-filled video on keyword difficulty, quality threshold and how you can use this understanding to help your pages rank on Google search.
Don't have 35 mins to watch it?
Here are the cliff notes 👇
This is a FAQpage rich result as displayed on the SERPs.
In this 🧵I'm going to show you how you can do it yourself.
For those who already know how to get FAQpage rich result using FAQPage markup, skip straight to the inserting🔗into your JSON-LD part - I've got a blog post on it👇 danielkcheung.com.au/links-in-faq-s…
First, what is a FAQpage rich result?
"A Frequently Asked Question (FAQ) page contains a list of questions and answers pertaining to a particular topic. Properly marked up FAQ pages may be eligible to have a rich result on Search" - Google documentation👇 developers.google.com/search/docs/ad…