Daniel K Cheung Profile picture
May 14, 2022 21 tweets 11 min read Read on X
"I have nooooo idea what to do with all this @screamingfrog information" - @danielkcheung, circa 2018-2020

If you want to learn to do some #technicalseo with one of the best crawling tools out there, here is a 🧵for you.
.@screamingfrog is a crawler that you download and install onto a local or virtual machine. It allows you to crawl almost any website.

I use it as part of my technical seo audit and their customer service is 🔥

However, the initial learning curve is steep.
If you've looked at a finished @screamingfrog crawl and felt overwhelmed - you're NOT alone.

I, and many others, were in your EXACT shoes.

It's ok
You've got this💪🏽

⏩ Here a 9 tips to make your life with Screaming Frog easier, more enjoyable, and F-U-N👇
1. Run in database storage mode

Configuration > System > Storage mode

Doing so will save your crawl data automatically to your computer and allows you to compare between crawls of the same site.
2. You don't need to crawl an entire site

Crawling an entire website can take hours (sometimes days).

🧠You don't *always* need to have full information to make recommendations. Often, a sample is all you need to ID symptoms and make relevant recommendations.
3. Add sitemap address to get sitemap insights

Because many robots.txt files do not reference the sitemap index URL you will have to add it yourself👇
4. Run crawl analysis to see sitemap issues such as:

⚠️ orphaned URLs
⚠️ URLs not in sitemap
⚠️ non-indexable URLs in sitemap

Once your crawl has finished, go to Crawl Analysis > Start then click on the Sitemaps tab and the right hand side panel will display sitemap stats👇
💡FYI, you can run a crawl analysis at any stage of the crawl - you do not have to wait to crawl the entire site.

This is great for crawling large sites and you want to get a sense if there are issues with the its sitemap.

Click Pause > Crawl Analysis > Start👇
💡A website's sitemap is one of the first things I look at.

🧠If I see many non-indexable URLs in the sitemap or many orphaned URLs, this is a strong signal that there are a host of technical issues and warrants a full audit.

Learn more about sitemaps👇
danielkcheung.com.au/xml-sitemap-fo…
5. Crawl a *specific* subfolder of a site (and nothing else)

Eg, I want to crawl all the URLs within /blog/ subfolder of a website.

To do this, go to Configuration > Include then put in https://domain/blog/.*

⚠️Don't forget the ‼️.*‼️
6. On your first crawl, crawl HTML files only

By default, @screamingfrog will crawl all image, CSS, JS and SWF files.

💡Uncheck these for your initial crawl to get a quick sense of the URLs and will save you some time.
7. Run crawler in Javascript mode

Many websites are single-page applications or rely on JS to render ⚡️important⚡️ content. Running in JS mode in @screamingfrog will give you a proxy of how well (or not) Google can crawl and render any website.
To do this:

1⃣ Configuration > Spider > Rendering
2⃣ Change "text only" to "javascript"

When crawling in JS mode:
@screamingfrog takes a lot longer (so be patient)
• go to Rendered Page to see if all important content can be rendered
It *literally* took me 3.5 years to discover the Rendered Page tab (👏@myriamjessier for showing me👏).

This is where you'll find the rendered view when you run a JS crawl in @screamingfrog 👇
💡If you don't see the body content shown in the Rendered Page panel, this is probably a strong signal that Google will have issues rendering text and internal links on the website.

💡If this happens, corroborate with Google Search Console data.
Since @screamingfrog takes⏳to crawl in JS mode, TameTheBots by @davewsmart is ⚡️ah-mazing⚡️ for quick diagnosis on a page-by-page basis.

IE, if important content is not displayed - this indicates a potential issue with rendering, indexing and ranking.

tamethebots.com/tools/fetch-re…
8. Look for 301, 302, 404 response codes

💡Most sites will have URLs that 301/302 redirect or have pages with 404 response codes.

After a crawl go to the Response Codes tab and filter URLs in ascending/descending order. This will show you all the URLs that are non-200👇
💡A really quick way to see this is by running as little extractions as possible.

You can do this by unchecking all "Resource Links" in Crawl Configuration and unchecking everything in Page Details in "Extraction".
9. Find low-content pages

URLs with less than 200 words often lack depth and may contribute to poor indexing of URLs. Luckily @screamingfrog can easily show you these URLs.

💡To find these run a Crawl Analysis, go to Content Tab then look for Low Content Pages.
💡 Not *all* pages with less than 200 words = bad. You'll have to use your own judgement to decide if the ones @screamingfrog has shown you are appropriate or not.

Recommended reading re: thin content pages (via @JonasSickler)👇
terakeet.com/blog/what-is-t…
And that's a wrap! I trust you found this insightful and inspired you to want to work with @screamingfrog even if you don't fully understand it.

For more, give me a follow
@danielkcheung to see how else I can inspire you in your SEO journey👇

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Daniel K Cheung

Daniel K Cheung Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @danielkcheung

Oct 23, 2022
How do you monitor branded vs non-branded traffic?

Semrush provides information this but it's a $99/m subscription.

Learn how to build your own branded vs non-branded visualisation using GSC + Looker Studio in 6 easy steps. ImageImage
If you don't already have GSC connected as a data source, click on "Add data" then type "search console" to find the connection.

You will then be prompted to select a GSC property.

For this exercise, you'll want to use Site Impressions and "web" data.

Click "Add". ImageImage
You're going to customise some regex soon.

Don't worry, it'll be easy.

Get a list of all your branded keywords from GSC.

In this example, the branded keywords and their variations will be:

• emerald bridal
• emeraldbridal
• emerald Image
Read 12 tweets
Jun 30, 2022
The first six episodes of #DreadingSundays have been recorded, edited and scheduled.

ICYMI, I'm launching a brand new podcast series ft. digital marketing industry professionals.

We'll be discussing salaries, raises, promotions etc.

Follow @DreadingSundays for updates. Image
@DreadingSundays Eg, I ask @tomcritchlow (author the SEO MBA) what you can do if you've not been successful in your recent job applications.
Eg, I ask Mercy Janaki (Group Head of SEO, @Webenza) how you can document your work successes so that you can ask for a raise/promotion.

Spoiler: use @SlackHQ if your org uses it.
Read 8 tweets
Jun 19, 2022
Old mate @KorayGubur dropped a knowledge-filled video on keyword difficulty, quality threshold and how you can use this understanding to help your pages rank on Google search.

Don't have 35 mins to watch it?
Here are the cliff notes 👇
@KorayGubur Indexing ≠ ranking

A search engine may index your content but it may decide NOT to serve it to its users.

That is, you can get your technical SEO right but not have your content rank in the SERPs.
@KorayGubur Nothing is static

Rankings are ALWAYS changing and a search engine is constantly working out where your webpage should be ranked.

With each crawl new quality scores are created.

And remember, competitors that invest in SEO are rarely static.
Read 18 tweets
Jun 11, 2022
How to do "white-hat" link building even @JohnMu can get behind.

SPOILER 🚨: it's not as easy as buying content with a 🔗 insert and this is what makes it so effective.

🧵 ..
@JohnMu FWIW, white-hat link building is a made-up term.
It means nothing.

One person's white hat is another person's grey/black.

In this context, #whitehat #linkbuilding = not paying for a placement or 'admin fee'.

Now we got that out of the way let's dive in.
@JohnMu This method gets called a few things:
• statistic pieces
• research reports
• "content marketing" 😏

💡 It involves collecting YOUR OWN data.

💡 It differs from the common approach of regurgitating the SAME data sources.

💡 This differentiation is HOW you will EARN links.
Read 18 tweets
May 13, 2022
This is a FAQpage rich result as displayed on the SERPs.

In this 🧵I'm going to show you how you can do it yourself.
For those who already know how to get FAQpage rich result using FAQPage markup, skip straight to the inserting🔗into your JSON-LD part - I've got a blog post on it👇
danielkcheung.com.au/links-in-faq-s…
First, what is a FAQpage rich result?

"A Frequently Asked Question (FAQ) page contains a list of questions and answers pertaining to a particular topic. Properly marked up FAQ pages may be eligible to have a rich result on Search" - Google documentation👇
developers.google.com/search/docs/ad…
Read 27 tweets
May 13, 2022
Do you want to use Help A Reporter Out (HARO) to land links?
HARO is a free service provided by Cision that enables journalists and PR sources to collaborate.

That is, journalists seek commentary from a source and in return, you *may* get a backlink from the publication they write for.
Anyone can sign up to HARO as a PR source. And it is completely FREE.

There is a paid version but in all honesty, you do NOT need it.

1. Go on, sign up now👇

helpareporter.com/subscriptions/…
Read 19 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(