1/?
:: Crawling Large, Huge and Mega sites ::
:: Partial vs Full crawls ::
@JetOctopus has done a piece looking at some of the issues that can arise from not doing a Full Crawl,
or, as I'd phrase it, doing a really shoddy Partial Crawl.
Yes - you should do a Full Crawl,
but you shouldn't need to do one every single time.
Instead, you should do an initial,
and run one after any significant change (such as platform etc. (preferably dev ver. first)
>>>
3/?
>>>
But most of the time, Partial Crawls should be sufficient,
if done properly!
So, here's a quick guide for Partial Crawling ...
> Priorities
You should know what pages are Vital to the site/business
> Updates
You should know URLs of new and altered content
>>>
4/?
>>>
> Changes (2)
You should know what URLs represent which files/templates (for code changes)
> Site knowledge
You should know if the "site" is made up of different platforms, the PageTypes and templates, the sections and depths of them etc.
>>>
5/?
>>>
> Sampling
You should pick a reasonable representative figure per Platform/PageType/Section
(as well as Priority/Updated URLs)
(No, 1K out of 85K (<1%) is Not good - and anyone sampling less than 10% should be beaten with a Haddock :D)
>>>
6/?
>>>
Now - I know that this sort of thing can seem a lot of work … but if you get a little Dev help,
it's actually really easy.
Have Meta added that includes some additional information (such as Template, Content Type etc.),
with Edit Dates.
And use the Sitemap!
>>>
7/?
>>>
This will give you a % of the site,
with a fair number of URLs per platform, section and each set of templates/content types,
and cover all the New content,
and any recently Updated content.
In most cases, it will still be less than 25% of the site.
>>>
8/?
>>>
Are Partial Crawls perfect?
No.
But they are far faster (and typically cheaper).
And due to that, they permit you to find issues sooner,
and more responsive to issues.
You don't need to wait for hundreds of thousands/millions/tens of millions of pages!
>>>
9/?
>>>
You can also make it part of your normal Full Scan,
but do it in 2 parts (or more!).
Run the Priority crawl,
then the Secondary whilst going through the Priority results.
Much more efficient!
The key is knowing what to crawl!
10/?
>>>
And, if you haven't read it yet,
here's the article by @JNesterets,
more than worth a read!
Getting the idea of ToF/MoF (or early journey) content being equal to BoF/Conversion.
Due to some SEOs and the way G says things,
there's this stupid misunderstanding about ranking pages (not sites),
and not getting how X supports/improves Z.
2/?
From a marketing/consumer perspective,
that content enables early awareness/recognition of the company,
and starts building trust, rapport and emotion/loyalty.
From an SEO perspective, it increases topicality and Link value flow (internal links have been important for years)
3/?
There's also the wonderful confusion (read as *fucking annoying) regarding the "funnel".
For some reason - people seem to think there's only one.
In most cases - there's 2 "broad" funnels (marketing and sales).
And the "marketing funnel" is often several funnels!
.
:: Regular performance audits are good ::
:: Alerts on Priority content is better! ::
You should have a separate segment,
tracking priority pages: 1) Those that do contribute heavily to Goals 2) Those that ideally would contribute to Goals
And those are the bare-basics!
There's also optional things...
>>>
3/5 >>>
Optional elements would include things like:
* Read time (est. based on word count)
* Social action/share links
* Pull quotes (and share-excerpt links for social)
* Comments/Reviews (and indication of such at top)
* Sub-Images (non-hero, set as lazy etc.)
etc.
Logic says G not only have a method (at least one),
but use it too!
And I've asked (at least 4 times) for an indication of what the threshold is,
and whether it's based solely on "content",
or if it includes code etc.,
and been ignored/evaded - every single time :(
I don't think there is a "specific" threshold,
I believe it is variable - based on availability.
G can/does show duplicates and highly similar results in a SERP,
esp. if there is little else for them to show.
The more "options" to show (that aren't dupes),
the fewer G shows.
.
Fantastic SEO:
Knowing that boilerplate nav yields some SEO value,
but is mainly for UX/CRO,
and putting links in the content of main nav pages,
to pass values - and - drive internal traffic and aid business goals (such as views/signups/conversions etc.)
Now, I get that it may seem a little backwards ...
> it's an "SEO tweet"
> talking about "Internal Links"
> but basically saying "try doing something else".
Site-wide/common links seem to pass less value.
(So though that link may be on 100 pages, it's not worth 100 links).
Further, as we understand it,
the value passed through a link may be influenced by the number of links on the page ((overly) simplified: more links, less value per link).
Adding more links across all/most pages,
may reduce the value given to other pages.
The patents are complex, obscure and there's not even a guarantee they are in use!
BUT - a fair part of the SEO we all use today…
…is based on insights gleaned from examination of older patents,
thanks to those that Did read the Google patents.
This is akin to saying:
* you don't need to do keyword research
* you don't need to look at the SERPs
* you don't need to be able to write good *giggle*
* you don't need to be technical
* you don't need to be analytical
* you don't need to be creative
etc. etc. etc.
>>>
>>>
You can do, (and some people do so), SEO without being able/much good at any/most/all of the above.
BUT ... every extra thing you do,
tends to improve your knowledge/skill/ability,
and that increases your chances of success.
(It does NOT mean you will succeed though!)