Thread by @eliasdabbas on Thread Reader App

1/
@Target's XML sitemap
Has 2.6M URLs with ~1M duplicates (39%)

Consistent URL structure mostly 4 dirs

5 types of pages
product
category
search
brand
store-location

Data
bit.ly/3WKbw3H
Code
bit.ly/3XB0sGH

#DataScience #advertools #SEO #Python #DataAnalytics

2/
Checking top category pages, we see the top 10 are mostly toys.

A more accurate approach is to count the words in those URLs, which shows that clothing and home are bigger.

We can also check the percentage of category URLs that contain those words for a different/better view

3/
We can easily go deeper by filtering URLs that contain "clothing", and checking what the top words are for those URLs.

We can go to sub-sub-categories by filtering for URLs containing "clothing" & "kids" for example.

You can interactively explore any combination you want.

4/
Product pages contain the best view on product distribution, using the same approach the top products are nothing seen yet (books)

We can see what the most used words in books are (guide, you, America).

For fun, we can also get top author first names as well.

5/
Search pages: /s/
These contain target search queries.

A quick manual check showed mostly empty pages (needs full verification of course)

Yet we can see what they are targeting more in those search pages.

Run your own analysis here bit.ly/3XB0sGH

END

Share this Scrolly Tale with your friends.

A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.

Share this page!

Enter URL or ID to Unroll