A scientific paper from Google gives some interesting insights into how Google today probably divides search queries into different thematic areas. Here is my summary of the paper in this thread.🧵 #seo #semanticsearch #google
The document"Improving semantic topic clustering for search queries with word co-occurrence and bipartite graph co-clustering"presents two methods that Google uses to contextually classify search queries.So-called lift scores play a central role in word co-occurrence clustering.
"Wi" in the formula stands for all terms that are closely related to the root of the word, such as misspellings, plural, singular or synonyms.
"a" can be any user interaction such as searching for a specific search term or visiting a specific page.
For example,if the lift score is 5,the probability that "Wi"will be searched is 5 times higher than that of "Wi"being searched in general. So the terms can then be assigned to specific entities such as Mercedes and/or the topical context class"Car" when searching for spare parts.
The context class and/or entity can then continue to be assigned terms that often occur as co-occurrences with the search terms. This is a quick way to create a cloud of terms for a specific topic. The level of the lift score determines the affinity to the topic:
🗣️"We use lift score to rank the words by importance and then threshold it to obtain a set of words highly associated with the context."
This method can be used in particular when "Wi" is already known, e.g. in search terms for brands or categories that are already known.
If "Wi" cannot be clearly defined because the search terms for the same topic are too different, Google could use a second method - "weighted bigraph clustering".
This method is based on two assumptions:
▶Users with the same intention formulate their search queries differently. However, search engines return the same search results.
▶Conversely, URLs similar to a search query are displayed on the first search results.
With this method, the search terms are compared with the top-ranking URLs and query / URL pairs are formed, the relationship of which is also weighted according to the click rates of the users and impressions.
So similarities can also be created between search terms that do not have the same root and form semantic clusters from them.
More insights into search query processing around interpretation of search queries in terms of meaning and search intent in my guide: kopp-online-marketing.com/search-query-p…
Here the formula according to the lift score:

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Oᒪᗩᖴ KOᑭᑭ ✌️🔥

Oᒪᗩᖴ KOᑭᑭ ✌️🔥 Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @Olaf_Kopp

Jun 30
Since 2016, I have researched over 100 Google patents and would like to take you through my insights from this work in relation to a semantic search engine like Google. In my new article I have gathered my insights on search query processing. I hope you will support it ❤🧵
In semantic information retrieval systems, entities play a central role in several tasks.

📌Understanding the search query (Search Query Processing)
📌Relevance determination at document level (scoring)
📌Evaluation at domain level or author level (E-A-T)
📌Compilation of SERPs Image
In all of these tasks, the interplay of entities and the composition of the individual terms of a search query with regard to determining the relevance of content are the core of search query processing.

The magic of interpreting search terms happens in the following steps:
Read 6 tweets
May 4
To build a semantic database like the Knowledge Graph, Google needs to access unstructured data sources in addition to structured data.

📌The problem with manually maintained databases and semi-structured websites such as Wikipedia is that the data is not complete or up-to-date.
📌The closed extraction of entities has the major disadvantage that long-tail entities can only be minded very slowly. Scaling requires an open extraction.

📌Machine learning or natural language processing is the key to open extraction.
📌The principle of the Knowledge Vault describes the process in theory. It can be assumed that a further development of the Knowledge Vault is in use today.
@jasonmbarnard @seostratega @isocialwebseo @Suzzicks @KorayGubur @aleyda @glenngabe @mediadonis @lilyraynyc
Read 5 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(