1/?
:: Crawling Large, Huge and Mega sites ::
:: Partial vs Full crawls ::

@JetOctopus has done a piece looking at some of the issues that can arise from not doing a Full Crawl,
or, as I'd phrase it, doing a really shoddy Partial Crawl.

#SEO #Crawling

>>>
2/?
>>>

Personally, I hold a slightly different opinion.

Yes - you should do a Full Crawl,
but you shouldn't need to do one every single time.

Instead, you should do an initial,
and run one after any significant change (such as platform etc. (preferably dev ver. first)

>>>
3/?
>>>

But most of the time, Partial Crawls should be sufficient,
if done properly!

So, here's a quick guide for Partial Crawling ...

> Priorities
You should know what pages are Vital to the site/business

> Updates
You should know URLs of new and altered content

>>>
4/?
>>>

> Changes (2)
You should know what URLs represent which files/templates (for code changes)

> Site knowledge
You should know if the "site" is made up of different platforms, the PageTypes and templates, the sections and depths of them etc.

>>>
5/?
>>>

> Sampling
You should pick a reasonable representative figure per Platform/PageType/Section
(as well as Priority/Updated URLs)

(No, 1K out of 85K (<1%) is Not good - and anyone sampling less than 10% should be beaten with a Haddock :D)

>>>
6/?
>>>

Now - I know that this sort of thing can seem a lot of work … but if you get a little Dev help,
it's actually really easy.

Have Meta added that includes some additional information (such as Template, Content Type etc.),
with Edit Dates.

And use the Sitemap!

>>>
7/?
>>>

This will give you a % of the site,
with a fair number of URLs per platform, section and each set of templates/content types,
and cover all the New content,
and any recently Updated content.

In most cases, it will still be less than 25% of the site.

>>>
8/?
>>>

Are Partial Crawls perfect?
No.

But they are far faster (and typically cheaper).
And due to that, they permit you to find issues sooner,
and more responsive to issues.

You don't need to wait for hundreds of thousands/millions/tens of millions of pages!

>>>
9/?
>>>

You can also make it part of your normal Full Scan,
but do it in 2 parts (or more!).
Run the Priority crawl,
then the Secondary whilst going through the Priority results.

Much more efficient!

The key is knowing what to crawl!
10/?
>>>

And, if you haven't read it yet,
here's the article by @JNesterets,
more than worth a read!

jetoctopus.com/partial-crawli…

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Lyndon NA (Darth Autocrat)

Lyndon NA (Darth Autocrat) Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @darth_na

Oct 9
1/?

:: B2B SEO difficulty ::

Getting the idea of ToF/MoF (or early journey) content being equal to BoF/Conversion.

Due to some SEOs and the way G says things,
there's this stupid misunderstanding about ranking pages (not sites),
and not getting how X supports/improves Z.

#SEO
2/?
From a marketing/consumer perspective,
that content enables early awareness/recognition of the company,
and starts building trust, rapport and emotion/loyalty.

From an SEO perspective, it increases topicality and Link value flow (internal links have been important for years)
3/?

There's also the wonderful confusion (read as *fucking annoying) regarding the "funnel".
For some reason - people seem to think there's only one.
In most cases - there's 2 "broad" funnels (marketing and sales).
And the "marketing funnel" is often several funnels!
Read 7 tweets
Sep 26
.
:: Regular performance audits are good ::
:: Alerts on Priority content is better! ::

You should have a separate segment,
tracking priority pages:
1) Those that do contribute heavily to Goals
2) Those that ideally would contribute to Goals

Alert when they drop!

#SEO

>>>
>>>

For SEO, the primary metrics will be:
1) Ranking position per term
2) The URL per term
3) CTR per term/URL
4) Impressions per term/URL

For Business, these may in turn influence:
Business Goal KPI (such as revenue)

When you start to see negative shifts,
>>>
>>>

... you want to start keeping an eye on things!

For starters - don't panic.
Verify!

A) Has it impacted Goal/KPI?
B) Is it actually a drop
C) Is it seasonal
D) Is the same shift happening for everyone

Not all drops are painful or unusual!

>>>
Read 11 tweets
Sep 24
1/5
🚧:: Page Layout and Content Concepts ::🚧
:: There is (should be!) a lot in a page! ::

For example, the basic break down below is missing:
* Date (Publish/Edit)
* Author
* Secondary Nav (Breadcrumbs)
* Primary Heading
* Secondary Heading

#SEO #Webdesign

>>>
2/5
>>>

* Introduction/Value Proposition
* Page Index/Content/Jump Links
* SubHeadings
* CTAs (Primary/Secondary)
* Support/Prop Links (to Primary/BoF/Goal page)
* Tertiary Nav (Section/Siblings)

And those are the bare-basics!
There's also optional things...

>>>
3/5
>>>

Optional elements would include things like:
* Read time (est. based on word count)
* Social action/share links
* Pull quotes (and share-excerpt links for social)
* Comments/Reviews (and indication of such at top)
* Sub-Images (non-hero, set as lazy etc.)
etc.

>>>
Read 5 tweets
Sep 23
.
:: Google doesn't have a % for Duplicate Content? ::

1) How does G avoid showing dupes in the SERPs?
2) How does G identify candidates for auto-canonicalisation?
3) Or decide to act on Redirects/CLE?

... and this is why Googlers typically don't answer my questions :(

#SEO Screenshot : Cropped : Two Tweets. Bill Hartzer (@bhartzer)
Logic says G not only have a method (at least one),
but use it too!

And I've asked (at least 4 times) for an indication of what the threshold is,
and whether it's based solely on "content",
or if it includes code etc.,
and been ignored/evaded - every single time :(
I don't think there is a "specific" threshold,
I believe it is variable - based on availability.

G can/does show duplicates and highly similar results in a SERP,
esp. if there is little else for them to show.

The more "options" to show (that aren't dupes),
the fewer G shows.
Read 4 tweets
Jun 24
.
Fantastic SEO:
Knowing that boilerplate nav yields some SEO value,
but is mainly for UX/CRO,
and putting links in the content of main nav pages,
to pass values - and - drive internal traffic and aid business goals (such as views/signups/conversions etc.)

#SEO #UX #CRO
Now, I get that it may seem a little backwards ...
> it's an "SEO tweet"
> talking about "Internal Links"
> but basically saying "try doing something else".

Site-wide/common links seem to pass less value.
(So though that link may be on 100 pages, it's not worth 100 links).
Further, as we understand it,
the value passed through a link may be influenced by the number of links on the page ((overly) simplified: more links, less value per link).

Adding more links across all/most pages,
may reduce the value given to other pages.
Read 7 tweets
Jun 22
100%

You don't need to!

The patents are complex, obscure and there's not even a guarantee they are in use!

BUT - a fair part of the SEO we all use today…
…is based on insights gleaned from examination of older patents,
thanks to those that Did read the Google patents.

#SEO
>>>

This is akin to saying:
* you don't need to do keyword research
* you don't need to look at the SERPs
* you don't need to be able to write good *giggle*
* you don't need to be technical
* you don't need to be analytical
* you don't need to be creative
etc. etc. etc.

>>>
>>>

You can do, (and some people do so), SEO without being able/much good at any/most/all of the above.

BUT ... every extra thing you do,
tends to improve your knowledge/skill/ability,
and that increases your chances of success.
(It does NOT mean you will succeed though!)

>>>
Read 5 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(