Natzir Profile picture
Feb 2 16 tweets 7 min read
#YandexLeak in Slice Web Production there are 42 factors in use to evaluate URLs, including URL characteristics (length, slashes, numbers), geography, user engagement, type and algorithms to calculate and predict relevance and quality (BM25, DSSM, GSK).
curiosity: IsObsolete checks it the URL is considered outdated if it has an old date. Old news is recognized as such. Factor 1 is applied if there is a year in the URL that is less than or equal to 2007.
#YandexLeak There are 3 factors used to detect AI-generated content. 2 of them with Rank Coefficient 0.08033186405 and 0.002431406823. They evaluate the unnaturalness of text from a Russian language perspective.
#YandexLeak It has 7 factors that check if the document has the search query in the same order. The classic posting lists, to select the documents that deserve to pass the first screening for reranking.
#YandexLeak To calculate the importance of a page, aside from PR, they use "BrowseRank". Instead of the "link graph", BrowseRank calculates the "user browsing graph" using behavior data taken from the Yahoo Bar.

Paper -> microsoft.com/en-us/research…
#YandexLeak They have 3 rank phases (L1, L2, L3). L2 & L3 (rerank), among other things, use user behavior data (ToS, active, unique and recurring users, % organic & direct vs other sources, if last session of the search is the host...) taken from Yandex Bar & Yandex Metrika
#YandexLeak It has 7 binary (0, 1) L2 & L3 factors relative to the domain extension. Specifically, they treat differently if it is .ru, .ua, .com or international domain.
#YandexLeak They have a QSRank (quality score?) metric that they use at domain level, at doc-query level and at context level (with left and right sliding windows of 500 tokens, nothing fancy here...).
#YandexLeak Attention if you are an ecommerce. They try to predict with neural networks (DSSM) on the page title if the page contains many, 1 or no products.
#YandexLeak They have 5 deprecated factors that penalized in ranking sites that participate in link exchanges and link wheels (they call them "link rings").
#YandexLeak With MatrixNet and intent dictionaries they calculate the search intent. Requests with intent to buy have a factor of 1, product requests have a factor of 0.6 and requests with intent not to buy, such as reviews, have a factor of 0.

yandex.com/company/techno…
#YandexLeak More deprecated factors for inbound links. One penalized if +50% were commercial anchors. Another rewarded if the anchor text was naked. Another, calculated at the time of the search, penalized domains with excess of anchors equals to the user query.
#YandexLeak The Russian version of Wikipedia have a special treatment with different dedicated factors. The curious thing is that they also have as a factor how many clicks your website receives from Wikipedia.
#YandexLeak They measure the freq. of popular Russian words used in a text (in one factor it does it with 200 and in another with the 500 most popular). This is often used by SE to better determine the relevance and subject matter of a text, and also to measure readability.
#YandexLeak if if if

/* IdfVariance will be used as a factor only when the document hasTR */

TR: relevant text base on word position, presence of text links, word frequency, among others, relative to the len. of the doc.

Seems to calculate a IDF variant for a particular word
#YandexLeak This is a list of content filtering rules used to filter out inappropriate or offensive content, such as words or phrases related to Russian president
source:
search/wizard/data/fresh/img_patch/images2.txt

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Natzir

Natzir Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @natzir9

Aug 11, 2021
🤯 OpenAI Codex: AI system that translates natural language to code.

openai.com/blog/openai-co…

Example 1: Creating a Space Game

vimeo.com/583550498
Example 2: Data Science

vimeo.com/583556869
Example 3: Talking to Your Computer

vimeo.com/583555013
Read 6 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(