Latest Twitter Threads by @spacy_io on Thread Reader App

Oct 16, 2023 • 4 tweets • 3 min read

🛠️ Skills Extractor Library from @nesta_uk: A new package to identify skill phrases in job advertisements & align them with standardized skills in established taxonomies.

It uses spaCy for NER + @huggingface sentence-transformers for mapping.

github.com/nestauk/ojd_da…

@ESCoEorg sponsored the project, and the @nesta_uk team created an excellent blog, package website, and GitHub repo to help you get started!

GitHub:
Blog:
Website: github.com/nestauk/ojd_da…
escoe.ac.uk/the-skills-ext…
nestauk.github.io/ojd_daps_skill…

Sep 1, 2022 • 12 tweets • 5 min read

A detailed thread 🧵 on how spaCy's Matcher works.

Sometimes regular expressions just aren't enough because they can only match raw strings.

But, could use linguistic features to search beyond just raw text?

Yes!

That is where our Matcher comes in

The Matcher helps you write rules to interact with spaCy objects 💫

This means you can search for raw strings, lexical attributes, and NLP model predictions.

So you can extract more difficult matches like the verb "to duck" in:

"That duck can duck quickly" 🚫 🦆

May 6, 2022 • 8 tweets • 7 min read

Is it possible to have entities within entities within entities? Like in the example shown below?

For that, you'll want to use the SpanCategorizer!

Let's discuss how this new feature works by sharing some slides from our recent @budapestmlforum talk by @kadarakos in this 🧵

@budapestmlforum @kadarakos Just like the EntityRecognizer, the SpanCategorizer first embeds then encodes.

Then comes the span suggester followed by the span encoder and finally the span classifer.

Let's walk through all these steps.

Apr 6, 2022 • 14 tweets • 5 min read

💡 A topic that often comes up on the discussions forum is spaCy's Vocab object and the vectors in it.

So let's do a thread 🧵 on vectors currently found in medium (md) and large (lg) models.

One of the main features of the Vocab in spaCy is the vector store.

This is the single place where pre-trained word-embeddings can be found.

Having a single place for these vectors saves on a lot of memory!

Feb 24, 2022 • 17 tweets • 16 min read

Our highlights of the last months in a thread!

🍏 Improved performance for spaCy
🥼 We launched spaCy Tailored Pipelines
🛡 The Guardian uses Prodi.gy
🍰 And many new libraries in the spaCy Universe

mailchi.mp/spacy/tailored…

We released spaCy v3.2, which improved performance for spaCy on Apple M1 and Nvidia GPU, added Doc input for pipelines, and provided registered scoring functions.

explosion.ai/blog/spacy-v3-2

Dec 16, 2021 • 10 tweets • 4 min read

How to build a spaCy v3 pipeline to analyze 1m reviews about health aspects of supplements?

A thread that takes a closer look at the steps needed!

Pipeline
1️⃣ Named Entity Recognition
2️⃣ Segmentation
3️⃣ Blinding Entities
4️⃣ Text Classification

explosion.ai/blog/healthsea

Healthsea uses Named Entity Recognition to detect health aspects in text, such as diseases or symptoms.

NER is the task of identifying non-overlapping spans like proper nouns and similar expressions:

spacy.io/usage/linguist…

Jul 12, 2019 • 13 tweets • 8 min read

📺 THE VIDEOS FROM #spaCyIRL ARE NOW LIVE! And we don't mind saying, they turned out great. Here's 12 talks about NLP research, development and applications. Summaries and links in the thread 👇 youtube.com/playlist?list=… Transfer learning has been the big topic in NLP for 2018 and 2019. @seb_ruder opened the conference with how the field has been changing, what it means for OSS, and what could be improved.

Share this page!

Enter URL or ID to Unroll