Profile picture
Joost de Valk @jdevalk
, 9 tweets, 5 min read Read on Twitter
I've been analyzing crawling by the major search engines and link research tools on @yoast together with @jonoalderson.

We store all our logs in Elastic, through Logstash, and analyze them in Kibana.

We have to talk about this. Some of these results are shocking.
Let's start with @bing.

Over the last 30 days Bing crawled ~84,000 URLs on Yoast.com. In return, we got ~3,200 visitors. Not only does that ratio simply not add up, from checking other sites, we're not even doing badly in @bing.
Our logs show @bing consistently crawls more 404s than all the other engines, and seems to keep doing so. 404s are often not cached. This is costing lots and lots of server time, electricity etc.

On our servers, it consumed 10 GB of data in those 30 days.
In terms of "getting something back" there's one thing that's worse than search engines with a small market share. It's link research tools.

@ahrefs consumed 5GB of data from our servers, hitting approximately 2,000 URLS/day. @moz was way nicer, with only 250MB.
But: @Yoast is only one website. And while it's dear to me, it's not THAT big. We *have* to talk about how all of these engines all maintaining their own index is ludicrous. Entire server farms are online to crawl the same things over and over again.
If you believe this Forbes article: forbes.com/sites/neilyeoh…

then 10k views per month is the equivalent of driving a car for over 5,000 miles. We do *double* that, on yoast.com, on search engine bots alone. Every month. And so do tons of other sites.
Even *if* @Google is making sure all the electricity *they* use is green, they're not buying green electricity for all the sites they hit. Nor are all those other bandwidth consuming bots.
All these bots put a tax on every website on the internet, by inflating hosting cost. Only @Google has a positive ROI on that for most sites.
There are solutions to this. Everyone using and feeding into @CommonCrawl would be one of them. Better notification systems, so engines only crawl what's truly changed, would be another. I bet there's more. Hit me with your ideas :)
Missing some Tweet in this thread?
You can try to force a refresh.

Like this thread? Get email updates or save it to PDF!

Subscribe to Joost de Valk
Profile picture

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member and get exclusive features!

Premium member ($3.00/month or $30.00/year)

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!