#YandexLeak in Slice Web Production there are 42 factors in use to evaluate URLs, including URL characteristics (length, slashes, numbers), geography, user engagement, type and algorithms to calculate and predict relevance and quality (BM25, DSSM, GSK).
curiosity: IsObsolete checks it the URL is considered outdated if it has an old date. Old news is recognized as such. Factor 1 is applied if there is a year in the URL that is less than or equal to 2007.
#YandexLeak There are 3 factors used to detect AI-generated content. 2 of them with Rank Coefficient 0.08033186405 and 0.002431406823. They evaluate the unnaturalness of text from a Russian language perspective.
#YandexLeak It has 7 factors that check if the document has the search query in the same order. The classic posting lists, to select the documents that deserve to pass the first screening for reranking.
#YandexLeak To calculate the importance of a page, aside from PR, they use "BrowseRank". Instead of the "link graph", BrowseRank calculates the "user browsing graph" using behavior data taken from the Yahoo Bar.
#YandexLeak They have 3 rank phases (L1, L2, L3). L2 & L3 (rerank), among other things, use user behavior data (ToS, active, unique and recurring users, % organic & direct vs other sources, if last session of the search is the host...) taken from Yandex Bar & Yandex Metrika
#YandexLeak It has 7 binary (0, 1) L2 & L3 factors relative to the domain extension. Specifically, they treat differently if it is .ru, .ua, .com or international domain.
#YandexLeak They have a QSRank (quality score?) metric that they use at domain level, at doc-query level and at context level (with left and right sliding windows of 500 tokens, nothing fancy here...).
#YandexLeak Attention if you are an ecommerce. They try to predict with neural networks (DSSM) on the page title if the page contains many, 1 or no products.
#YandexLeak They have 5 deprecated factors that penalized in ranking sites that participate in link exchanges and link wheels (they call them "link rings").
#YandexLeak With MatrixNet and intent dictionaries they calculate the search intent. Requests with intent to buy have a factor of 1, product requests have a factor of 0.6 and requests with intent not to buy, such as reviews, have a factor of 0.
#YandexLeak More deprecated factors for inbound links. One penalized if +50% were commercial anchors. Another rewarded if the anchor text was naked. Another, calculated at the time of the search, penalized domains with excess of anchors equals to the user query.
#YandexLeak The Russian version of Wikipedia have a special treatment with different dedicated factors. The curious thing is that they also have as a factor how many clicks your website receives from Wikipedia.
#YandexLeak They measure the freq. of popular Russian words used in a text (in one factor it does it with 200 and in another with the 500 most popular). This is often used by SE to better determine the relevance and subject matter of a text, and also to measure readability.
/* IdfVariance will be used as a factor only when the document hasTR */
TR: relevant text base on word position, presence of text links, word frequency, among others, relative to the len. of the doc.
Seems to calculate a IDF variant for a particular word
#YandexLeak This is a list of content filtering rules used to filter out inappropriate or offensive content, such as words or phrases related to Russian president
source:
search/wizard/data/fresh/img_patch/images2.txt
• • •
Missing some Tweet in this thread? You can try to
force a refresh