Tweet

Alton

Jan 28 • 27 tweets • 8 min read

Can you reverse engineer Google?
🎯HUGE LIST OF RANKING FACTORS🎯

▶️▶️▶️Yandex says yes!! 🤯

- Full ENGLISH translations
- All 1922 search factors!
- Full catalog and analysis to surprise #SEOs

((+free 🐍 #python code! 🧵👇))

First I wrote a script to translate descriptions to English and convert it into a tabular CSV format:

@iPullRank

Every major organization invests in monitoring their competitors. Yandex is the same.

h/t to @iPullRank for finding the Google parser:

@alton_lex

Ready for all the ranking factors?? Full 🧵 below:

1) Like + comment and I'll share the full dataset
2) follow me @alton_lex for more analysis
3) grab a copy of the code for translating here github.com/FrontAnalytics…

I'm very interested in ranking factors by category (tag)

CATEGORIES INCLUDE:
1) Doc (basic factors)
2) static content factors (doc)
3) query only factors
4) host factors
5) url text factors
6) dynamic factors
7) user factors

& more 👇

First, some tags suggest that the factor isn't used.

1273 of the 1922 factors are tagged as
- unimplemented
- depreciated
- unused

So we will remove all these from further analysis:

There are some basic Content factors:

### IsHTML
Document type - HTML

### IsPo***
**** Chick document

### IsFake
fake document

### IsUnreachable
The page is unreachable via links from the muzzle.

## IsHttps
The document has the https protocol

Query only factors:

## LongQuery
The sum of the idf of the query words.

## WordCount
Min(number of query words/10, 1.f)

## InvWordCount
1 / number_words_in_request.

## SyntQuality
Does the query have full parsing

## ExpectedFound
Expected number of searches on the query

Type of content factors (static and doc):

## IsEShop
commercial page

## IsMainPage
If the main page of the owner, the factor is 1.

## Long
Long document (the longer the document, the greater the value of the factor).

## IsForum
The URL satisfies the FORUM_DETECTOR reg

Examples of dynamic factors include:

## FreshNewsDetectorPredict
The value of the news detector calculated in behemoth. Always 0 when the detector value is less than the threshold.

## PiracyDetectorPredict
The value of the pirate detector calculated in behemoth.

URL factors

## UrlLen
URL length divided by 5

## IsObsolete
There is an ancient date in the URL. Ancient news are recognized. Factor 1 if url has year <=2007.

## OnlyUrl
All matches are in the URL only, no matches in the text of the page

## IsCom
.com domain

More Link factors:

## UrlHasNoDigits
There are no numbers in the url

## NumSlashes
Number of slashes in the url

## IsNotCgi
Factor about presence of '?' symbol in url. Equals zero if url has cgi parameters (more precisely: all duplicates have '?' symbol in url).

Geo Match:

## GeoCityUrlRegionCity
Matching geography defined from document url and query city (ip or lr)

## GeoCityUrlGeoCityCity
Match the geography defined from the document url and the city in the query (GeoCity rule)

## GeoCountryUrlGeoCountry
nan

Host factors:

## IsBlog
Blog page

## IsWiki
page from ru-wikipedia-org

## IsOwner
Whether the host is its own owner, conditionally Host == Owner(Host).

## NastyHost
Host Po*** classifier, chips about *** queries that were shown and clicked on by the host

Browser factors:

## YabarHostSearchTraffic
Share of traffic from search engines

## VisitsFromWiki
Number of hits to the Wikipedia url

## BrowserHostDwellTimeRegionFrc
Ratio of dwell time on a host in a given region to dwell time on a host in all regions

More user/browser factors:

## BrowserBookmarksUrl
The more users add to bookmarks a url

## More90SecVisitsShare
Share of visits, for which the dwell time during the day on the host is more than 90 sec

## VisitorsReturnMonthNumber
Number of users who returned during the month

One of the most interesting categories to me is "text machine":

## BodyPairMinProximity
PairMinProximity factor over hits from Body

## BodyMinWindowSize
MinWindowSize factor over hits from Body

## TextCosineMatchMaxPrediction
CosineMatchMaxPrediction factor over hits from Text

What's more interesting is that a majority of the "text machine" factors also are tagged with "NN over feature use" tag.

This suggests that:
1) the use of Neural Nets
2) a LLM for text search
3) these simple scraped factors are overridden
4) tiers of indexing complexity

Example feature tagged with "NN" includes:

## NeuroTextModelLongClickPredictorByWord
The result of applying a neural model trained to distinguish long clicks from other events, the input of the model are word and bigram counters, calculated from text streamlines (Body, Url).

https://twitter.com/alton_lex/status/1619142191963840512

Conclusions to my investigative analysis of Yandex and their ability to reverse engineer Google Search.

- 5 key takeaways
- recommendations
- plus my next steps

(start here.... conclusions below)

https://twitter.com/alton_lex/status/1619142191963840512

https://twitter.com/alex_buraks/status/1618988134850785280

1of5) It is now evident to me that the file used for this analysis (although accessible by dropbox) might have been sourced from an alleged illegal leak.

Since this may be true I want to make it clear that I do not condone illegal behavior.

My source:

https://twitter.com/alex_buraks/status/1618988134850785280

2of5) The file and its translation suggest it is related to yandex search (a reasonable conclusion that others have implied)

3of5) If 1 and 2 are true, then yandex is doing similar competitive research that all companies perform (a reasonable assumption)

4of5) nothing in the translated contents I've seen suggests that any approach or methodology it is proprietary or beyond common knowledge (see US patents for better details of each)

5of5) I personally see no value in continuing to explore the yandex factors because: (continued)

5a) even if a factor is identified it might not be used in production

5b) if a factor is used, it is useless without the coefficients or relative weighting logic

5c) obtaining any proprietary coefficients without permission or by illegal means would clearly be unsanctioned

So can anyone reverse engineer Google?

It certainly appears that Yandex and lots of others certainly want to understand Google rankings!

Draw your own conclusions from the full translated csv and 🐍 code:
github.com/FrontAnalytics…

twitter.com/alton_lex

Finally, none of this analysis helps in any way with the ongoing data collection and research I'm doing.

Contact me for more information:
twitter.com/alton_lex

@alton_lex

Join me on this journey!

I will continue to collect my own data to analyze what helps content sites succeed on organic search:

1) follow me @alton_lex for more details
2) feel free to DM me with questions
3) Please share if this was helpful!

https://twitter.com/alton_lex/status/1598132992785805313

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @alton_lex

Alton

@alton_lex

Jan 27

@alton_lex

Get ready for some sweet analysis coming soon!! Follow me @alton_lex for more...

https://twitter.com/alex_buraks/status/1618988134850785280

The data geek in me just got really really excited 🤓

I'm seeing some initial exploration in the wild...

https://twitter.com/iPullRank/status/1619074664076120064?s=20&t=XXAtzBPYZXPenwyAaVtHbw

Read 5 tweets

Alton

@alton_lex

Jan 27

I woke up this morning and realized......

🐟I'M HALF WAY TO MY 2023 GOAL!!!🐟
.... ( and its still only January! )

NEW HERE??? click expand to see all:
🐍Multiple breakout tweets + free code
🏆How I got here
😜Current projects
🥳Big announcement

https://twitter.com/alton_lex/status/1609980047296987142

At the beginning of the year I set big goals...

Thank you for following along!

Below are my biggest tweets this year (so far) 😉

https://twitter.com/alton_lex/status/1609980047296987142

https://twitter.com/alton_lex/status/1616331845918154753

Code + content generation continues to be trending.

This tweet has over 700 likes and +160k views!

https://twitter.com/alton_lex/status/1616331845918154753

Read 9 tweets

Alton

@alton_lex

Jan 26

Link Velocity... the good, bad, and ugly (+ 🐍code!!)

(A 🧵 of threads )

First a quality scale for content sites:

⭐️⭐️⭐️ Quality + maintained + high velocity
⚪️⭐️⭐️ Quality content + maintained
⚪️⚪️⭐️ Quality content
⚪️⚪️⚪️ Quality content + not maintained
💩💩💩 Poor quality + high velocity

How do you measure content velocity?

1⃣ scrape the sitemap for content
2⃣ determine content date (published, updated)
3⃣ plot the trend & volume

Examples and more info below: 👏👏

Read 12 tweets

Alton

@alton_lex

Jan 26

🔥 4 essential site Quality metrics ( +code 🐍)

Track niche sites and competitors...
- beyond just DA and monthly traffic
- identify better link building!

(( FREE python code for #seos 🚀 )) ...save! 🧵

@fatjoewho

This week's free code is thanks to a 'prompt'...
(a collab w/@fatjoewho -> SEO services on-demand)

THE PROMPT???

a) I got a DM from @joecowmanwd
b) I checked if everyone called themselves Joe😅
c) We got right to work....

.... with a first draft from ChatGPT 😇

Fully tested and working 🐍 #python code!! FREE

🎯 SITE-WIDE (DOMAIN) QUALITY METRICS: 🎯

1⃣ Ratio of URLs to Outbound Links
2⃣ Pages published per year
3⃣ Traffic/Pages Ratio
4⃣ Sitemap Links compared to Indexed Links

See code! 👇👇 (Like + Share_;)
github.com/FrontAnalytics…

Read 7 tweets

Alton

@alton_lex

Jan 20

How to Auto-write 1000's of articles via GPT API...

... exact code to create #seo articles! 🤑

(( functional 🐍 code + an article template ))

FREE: Here is the exact code I used to create AI articles for pennies each article!

Each model differs in size, quality, and cost:

▶️1st example uses Curie
▶️2nd example uses the latest DaVinci GPT-3

All the code is here to test your own:
github.com/FrontAnalytics…

https://twitter.com/alton_lex/status/1610043353126801408

Do AI articles even rank on Google? What about cannibalization and duplicate content?

A: I suggest you run your own experiments and see what works for your audience and your niche. No guarantees!

Here's one of my AI experiment sites (~3000 articles):

https://twitter.com/alton_lex/status/1610043353126801408

Read 7 tweets

Alton

@alton_lex

Jan 18

Are you still using ChatGPT to generate content???
💎I'm giving away a FREE code tutorial💎

I'm tired of watching #SEOs copy and paste!!!

(( free 🐍 #python code + prompt examples👇👇))

Here is the fully tested #python starter script.

🏆INCLUDES:
- Connecting OpenAI GPT API
- Helper function
- Prompt templates
- Dozens of REAL examples

Steps:
1) run it locally or use google colab
2) Paste in your OpenAI API key
3) Generate Content 🎯
github.com/FrontAnalytics…

This is not a joke or click bait.

I seriously want to see people succeed and this is the [best] step to get ready to use the ChatGPT API!

Open the code and you'll see prompts like this:

Read 10 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Alton

People who liked this thread also liked...

Try unrolling a thread yourself!

More from @alton_lex

Alton

Alton

Alton

Alton

Alton

Alton

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!