Discover and read the best of Twitter Threads about #NLP

Most recents (24)

A very good paper I came across this morning by the @DeepMind researchers. For the past five years Transformers have been one of the most dominant approaches to Deep Learning problems, especially in the #NLP domain.

However, despite many interesting papers on the topic, and lots of good open code, there has been a noticeable lack of *formal* definition of what transformed are, especially on the level of pseudocode.

This paper aims to rectify that. It provides pseudocode for almost all major Transformer architectures, including training algorithms.

Read 5 tweets
πŸ“š Calling NLPers working on Dutch! I have created a dataset of 200K+ book reviews in Dutch with 1-5 ratings and negative, neutral, positive labels, augmented with a lot of metadata. I also publish base models. #NLP

All available at @huggingface hub!…
I scraped the Dutch book πŸ“˜ community website @hebbannl (thanks!) and collected as many reviews as I could. The unfiltered set contains 206,113 reviews. The filtered set (train+test) 202,796.
@Hebbannl πŸ–ŠοΈ Metadata include, a.o.: no. tokens, no. likes and comments, pub. data, detected language with #fastText. The detected language is important to be able to filter out non-Dutch reviews!
Read 9 tweets
Yesterday, I completed my months-long quest to accurately identify all sci-fields emerging in WW other than the well-established Neuro ⬇️
For, it's been long since we noticed emerging scientific tangents that span diverse fields like: #ML, #Materials_Science, #Pharmacology, #Genomics, #Open_Hardware, etc
Awesome -- But, how does one possibly map all these frontiers without employing actual people to read and annotate dozens of hundreds of abstracts at a time?
Read 16 tweets
(1/4) No Language Left Behind πŸš€

@MetaAI open sources this week their advanced #NLP model - No Language Left Behind. The model provides high-quality translations directly between 200 languages 🀯- including low-resource languages like Asturian, Luganda, Urdu 🧡
#opensource #ML
(2/4) Implementation of the LASER (Language-Agnostic SEntence Representations) models available on the Meta #Python package LASER (see link πŸ‘‡πŸΌ). Integration with #huggingface is coming soon πŸ€—.
(3/4) If I need to guess, someone on Meta really loved the Trolls movie and named the model after the movie slogan - No Troll Left Behind 🧌🌈
Read 4 tweets
This week @Google researchers announced Minerva, an internally developed project that can answer mathematical questions and tackle other complex topics such as physics.

This project makes some really impressive gains with automatic NLP approach to tackling the challenging quantitative reasoning problems. Minerva is a large language model pretrained on general natural language data and further trained on technical content.

The model achieves state-of-the-art performance on technical benchmarks without the use of external tools.

Read 5 tweets
Neural Network with Flax πŸš€πŸš€πŸš€

#Flax is a Google #OpenSource Python library for neural network applications for JAX 🌈. While most folks are familiar with Google #TensorFlow and #Keras, 🧡 πŸ‘‡πŸΌ

#DataScience #deeplearning #DL #python Image
While most folks are familiar with Google TensorFlow and Keras, Flax is less known, but it is mainly used by researchers and engineers at Google.

One of the core use cases of this library is for #NLP #Transformers and image recognition applications Image
Among the Flax applications, you can find:
βœ… Neural network API (flax.linen): Dense, Conv, Norm, Attention, Pooling, Cell, Dropout
βœ… Utilities and patterns: replicated training, serialization and checkpointing, metrics, prefetching on device
βœ… Educational examples Image
Read 5 tweets
Good morning from Brussels! We're attending #METAFORUM2022 today, bringing you updates from the world of language technology. Empty panel of the conference, with the logos of European La
There are roughly 100 people in the room. And several hundred following along on YouTube: .

Join us!
Kicking things off now is @GeorgRehm who also attended our hybrid DG TRAD Conference last year. He underlines that #multilingualism is at the heart of the European idea. 24 EU languages plus countless regional and minority ones!
Read 46 tweets
Twitter has a large Data Science community.

If you are passionate about Data Science, Machine Learning, Deep Learning, NLP, and many other topics.

Here are the 20 Twitter accounts that consistently share very informative content and help the community grow exponentially.

1). Santiago (@svpino)

2). Gus (@gusthema)

3). Patrick Loeber (@python_engineer)

4). Mark Tenenholtz (@marktenenholtz)

5). Ykdojo (@ykdojo)
6). Elvis (@omarsar0)

7). Sanyam Bhutani (@bhutanisanyam1)

8). Bojan Tunguz (@tunguz)

9). Ivey | Thoughts on Data (@thoughtsondata)

10). Chanin Nantasenamat (@thedataprof)
Read 7 tweets
Ever wondered what it takes to build an intelligent Q/A assistant? πŸ€”

@OpenAI #gpt3 and a few hours is all you need!

🀯 Yes, you heard it right!

πŸ•Ή Built a #javascript wizard using @streamlit to answer all your queries like a human expert!

A thread 🧡

#nlproc #AI #lowcode
@OpenAI @streamlit Gone are the days when you had to spend hours on @StackOverflow for resolving code-related queries!

πŸͺ„ Javascript wizard gives you precise answers to all your #JS related questions by leveraging #gpt3's latest code-davinci model that understands your queries just like humans!
@OpenAI @streamlit @StackOverflow I have made the application code #OpenSource so you can just clone the repo and build #gpt3 powered #AI applications for your usecase!

πŸ‘¨β€πŸ’» GitHub Repo -…

#NLP #lowcode #nlproc
Read 8 tweets
A quick recap on why coding (#Python) may help some #SEO professionals or some people pursuing their goals.

A short thread for those folks looking for motivation 🧡
🐍 Scrape competitors to get their headings and optimize accordingly.

Check their sitemaps/RSS feeds to find articles and understand their content frequency.
🐍 Analyze SERPs and find keywords with the same pages.

Analyze titles, get the most common words and visualize them.
Read 8 tweets
Big leap for the @OpenTargets platform. Gene burden from 450k+ @uk_biobank exomes from @AstraZeneca and @Regeneron, ClinVar structural variants, NLP classification of clinical trials stop reasons and drug label indications,etc. Systematically filling the gaps one-by-one 🧡
On our mission to identify potential causal targets from genetics 🧬, the strongly-powered @uk_biobank burden tests provide us an opportunity to close the gap between rare disease variation, and common variation from GWAS studies - as post-processed by the Genetics Portal
The 2 complementary analyses expand our view on the mechanistic effect of LoF variants, quantitative traits and genetics in more diverse populations. Big thanks to the authors for data sharing and @GWASCatalog for archiving and harmonising the information
Read 6 tweets
An article about new #website #categorization service using #nlp approach and machine learning model:…
It offers both google products #taxonomy which is ideal for #ecommerce sector as well as the IAB taxonomy which is more commonly used for general website categorizations, e.g. for marketing.
Why need for categorizations in marketing? If an advertiser wants to publish ads on publishers websites, generally it wants to know in what kind of category are the publishers websites.
Read 6 tweets
the service is using ML model trained on large number of annotated product texts. google product taxonomy was taken as the base taxonomy, which was then customised.
the #platform can be used both as dashboard or as #API #json endpoint. here is the url for IAB classification that you can try out:…
Read 5 tweets
it is trained on large data set of labelled categories from iab and google product taxonomy. it supports over 1000+ categories with high accuracy.
with integration of neural machine translation #NMT services, the website categorization can be done on websites in 100+ languages.
Read 6 tweets
@euphoniYum @spriter99880 1/3. There is a #Power called "#Evolution" and the #Survival of the best "#Fit".
To #Unite many #Different views into 1 and giving up one's OWN desires for that of the GROUP, Homo Sapiens Sapiens developed #DNA-caused "#Group #Thinking", to change from #Many views to #ONE.
@euphoniYum @spriter99880 2/3. >>But these #Genetics are #Predictable and can be #Manipulated by the #Few.
Mass-Psychologists, in #Think #Tanks of the ruling class in the West, Oligarchs, developed "#NLP" or "#Neuro-#Linguistic #Programming", known as "Marketing", Brain Washing and yes... Propaganda.
@euphoniYum @spriter99880 3/3. >>Key is the presence of "#Fear", "#Uncertainty" & "#Chaos".
An article about this Geopolitical tool used by the #Empire from #SouthFront, blocked by Twitter.
The URL is in the top of the page. That will work.
scrollπŸ‘‡1st page
SouthFrontπŸ‘‰πŸ‘ˆvia Archive
Read 4 tweets
Lot's of #SEO Folks want to know about "Google Patents"
and the Hero of " PATENTS" @bill_slawski mostly discuss on it

So I also want to contribute on it 🫣

πŸ—£οΈHere is list some of the patents

Every #SEO Folk think!!!

What are Google Search Patents?πŸ€”

Here is answer dudeπŸ₯°

πŸ—£οΈThe patents are technical documents that give detailed descriptions for various bits of the search algorithm.
1. Content Clustering

✍️The patent describes grouping websites and pages by topic and creating something that can be described as expert clusters.

✍️Content from these clusters is then given priority when serving search results for a related query. Content clustering
Read 25 tweets
Most useful ideas from #NLP to improve your #SEO workflow or find new ideas.

A short recap on my threads about Natural Language Processing too.

A thread about some concepts that you don't want to miss out on if you love data and making analyses. 🧡
Let's start with the definitions and all the cool stuff:

No, you don't need to learn it all or be a master of the subject. Enough is fine to get a high-level understanding of the new stuff.

Just because something worked today it doesn't mean you can get the same results by iterating over and over.

Google is getting smarter.
Read 30 tweets
Content curation/planning is one of the most interesting parts of #SEO and for good reasons.

A short thread outlining some personal considerations and common mistakes I learned over time 🧡
Relying too much on tools. The maximum example is represented by the common misuse of tools such as Yoast SEO.

Green lights mean nothing for the user, they're third-party metrics with no relation to Google.

Be sure to understand what is great content.
Following a checklist. This is very different from content briefs/templates!

I am referring to static processes that involve the repetition of some steps just because they worked in the past. >>>
Read 36 tweets
A list of the most useful #Python libraries you can use for #SEO right now. 🐍

This updated thread will tell you the main libraries for #DataScience and #NLP that you should consider. Use them in your workflow! 🧡
Numpy & Pandas: the foundations for data analysis, just learn them.

Without these 2 libraries, you cannot do Data Science at all. Good knowledge of Pandas can get you quite far.
Advertools: the best SEM library out there and for SEO too. It’s very useful for crawling, log file analysis, analyzing SERPs and querying the Knowledge Graph.

The ideal Swiss-knife you need in your arsenal.
Read 24 tweets
Transformers are Deep Learning models based on attention. What does that even mean?

A thread based on the importance of such models and why an #SEO should care in 2022. 🧡
They are state-of-the-art models that are dominating the field of #NLP and for good reasons. I will try to explain shortly what's so special about them.

You don't need to know every single detail, getting the high-level overview is more than fine.
They were introduced quite recently, in 2017 to be precise. They're nothing else than a type of Neural Networks (NN), namely Deep learning models.

Behind this research, there was our dear Google as well. The intention was to create a model that scales well with big data.
Read 40 tweets
A handful of lessons I learned (and I am still learning) while trying to apply #DataScience to #SEO. Some of them are not so obvious either.

This is an updated thread with new personal considerations 🧡
Communication is hard and you will get mad a lot of times. Non-technical people have no clue what you are talking about and you have to educate them.

Easier said than done, but I think that you should stay strong and keep trying.
Data quality is all. In SEO it's way harder as you are working with estimates and you don't even know the original data distribution.

That is why I am very careful when using Machine Learning models for SEO.

Now I'm getting more used to NLP tho.
Read 43 tweets
I've talked about Natural Processing Language (#NLP) before. What is the difference with NLG and NLU?

Behind these terms lies something more important for #SEO Specialists.

I will explain you what are these strange acronyms in this thread 🧡
For the NLP definition, check my other thread on the topic. It is a clear and concise explanation on the subject.

Natural Language Generation (NLG) can be defined as the use of Artificial Intelligence to create content.

This is what tools like do. They can generate texts according to your instructions and depending on how they are trained.
Read 30 tweets
What is Natural Language Processing (#NLP) and why it is so important for #SEO?

There are some clear benefits but first we need some clear explanations.

This thread shows you why this subject can improve your results and the role it plays in modern search engines. 🧡
NLP is a branch of Machine Learning that enables machines to understand and process natural language, i.e. what we humans say.

Our language is complex, ambiguous and sometimes misleading, it's a tough job for a machine!
More specifically, it's a subset of Computer Science, Linguistics and Artificial Intelligence.

Computational Linguistics is another subject closer to Linguistics than it is to Engineering. You can think of NLP as the other way around.

Read 28 tweets
Some personal considerations about the new trends in #SEO and the influence of coding and data in my journey.

This is a personal thread focused on explaining how different subjects can influence you 🧡
I started with #Python relatively early, I was into R before. The concept doesn't change either, they are just tools.

I decided to get into coding because I felt it was my route. I am improving everyday but I am still far from the biggest names in the industry or elsewhere.
I've always noticed that data are still misused by companies and there is a lot of misinformation.

Think about all the people using Excel as a database or SEO case studies with super weak proofs.

This is just the tip of the iceberg.
Read 25 tweets

Related hashtags

Did Thread Reader help you today?

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!