Elias Dabbas Profile picture
Jan 24, 2023 5 tweets 5 min read Read on X
1/
@Target's XML sitemap
Has 2.6M URLs with ~1M duplicates (39%)

Consistent URL structure mostly 4 dirs

5 types of pages
product
category
search
brand
store-location

Data
bit.ly/3WKbw3H
Code
bit.ly/3XB0sGH

#DataScience #advertools #SEO #Python #DataAnalytics
2/
Checking top category pages, we see the top 10 are mostly toys.

A more accurate approach is to count the words in those URLs, which shows that clothing and home are bigger.

We can also check the percentage of category URLs that contain those words for a different/better view
3/
We can easily go deeper by filtering URLs that contain "clothing", and checking what the top words are for those URLs.

We can go to sub-sub-categories by filtering for URLs containing "clothing" & "kids" for example.

You can interactively explore any combination you want.
4/
Product pages contain the best view on product distribution, using the same approach the top products are nothing seen yet (books)

We can see what the most used words in books are (guide, you, America).

For fun, we can also get top author first names as well.
5/
Search pages: /s/
These contain target search queries.

A quick manual check showed mostly empty pages (needs full verification of course)

Yet we can see what they are targeting more in those search pages.

Run your own analysis here bit.ly/3XB0sGH

END

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Elias Dabbas

Elias Dabbas Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @eliasdabbas

Jan 23, 2023
XML sitemap of @Apple with a quick audit/analysis

105,244 (26.2%) duplicated URLs with 45% with a mysterious parameter "fh" which mainly provides an empty page.

Data: bit.ly/3WKbw3H
Code: bit.ly/3J86xWw

#DataScience #SEO #advertools #Python

1/
Unsurprisingly /shop/ is the biggest directory (number of URLs), and we can also see what are the biggest product categories under that with a simple filter.

But what is /today/ ??

It seems they run live online events for product demonstrations and explanations

2/
Most of these ~13k pages are empty (event expired), and it's not clear why they even have them in the sitemap.

This definitely needs fixing. But we can still use the data to understand what topics they focus on in their events (iPhone, photo, art)

3/
Read 4 tweets
Dec 27, 2022
1/
Auditing vs Analysis

Auditing is about inspection, and making sure things are done right, legally, and correctly. Analysis is about understanding and getting insights on the thing you’re analyzing (website, content, etc.)

An “SEO audit”, a “website audit”, a “content audit"
2/
Auditing is more important than analysis. It comes first. Your page has to function properly before it can be analyzed/evaluated. It tells you where potential problems are or might be. Basically, you are making sure bad things don’t happen. Auditing doesn’t inform
3/
the process of understanding or ideas for improvement as much. This is where analysis comes in. Using a car analogy, auditing is making sure you don’t have flat tires, engine is working fine, etc.

Analysis is looking at the historical data of someone’s trips, understanding
Read 6 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(