spaCy Profile picture
Open-source library for industrial-strength Natural Language Processing in Python. Developed by @explosion_ai 💥 📖 https://t.co/YkVR838V7S 📘 https://t.co/6kqoeWgvd2 📺 https://t.co/46ioYHURlW
Jerome Ku Profile picture 1 subscribed
Oct 16, 2023 4 tweets 3 min read
🛠️ Skills Extractor Library from @nesta_uk: A new package to identify skill phrases in job advertisements & align them with standardized skills in established taxonomies.

It uses spaCy for NER + @huggingface sentence-transformers for mapping.

github.com/nestauk/ojd_da…
Diagram representing two stages of the Skills Extractor Library. First, input text uses a NER model to identify skills and experiences. Then those skills are mapped to a known taxonomy. @ESCoEorg sponsored the project, and the @nesta_uk team created an excellent blog, package website, and GitHub repo to help you get started!

GitHub:
Blog:
Website: github.com/nestauk/ojd_da…
escoe.ac.uk/the-skills-ext…
nestauk.github.io/ojd_daps_skill…
Sep 1, 2022 12 tweets 5 min read
A detailed thread 🧵 on how spaCy's Matcher works.

Sometimes regular expressions just aren't enough because they can only match raw strings.

But, could use linguistic features to search beyond just raw text?

Yes!

That is where our Matcher comes in The Matcher helps you write rules to interact with spaCy objects 💫

This means you can search for raw strings, lexical attributes, and NLP model predictions.

So you can extract more difficult matches like the verb "to duck" in:

"That duck can duck quickly" 🚫 🦆
May 6, 2022 8 tweets 7 min read
Is it possible to have entities within entities within entities? Like in the example shown below?

For that, you'll want to use the SpanCategorizer!

Let's discuss how this new feature works by sharing some slides from our recent @budapestmlforum talk by @kadarakos in this 🧵 @budapestmlforum @kadarakos Just like the EntityRecognizer, the SpanCategorizer first embeds then encodes.

Then comes the span suggester followed by the span encoder and finally the span classifer.

Let's walk through all these steps.
Apr 6, 2022 14 tweets 5 min read
💡 A topic that often comes up on the discussions forum is spaCy's Vocab object and the vectors in it.

So let's do a thread 🧵 on vectors currently found in medium (md) and large (lg) models. One of the main features of the Vocab in spaCy is the vector store.

This is the single place where pre-trained word-embeddings can be found.

Having a single place for these vectors saves on a lot of memory!
Feb 24, 2022 17 tweets 16 min read
Our highlights of the last months in a thread!

🍏 Improved performance for spaCy
🥼 We launched spaCy Tailored Pipelines
🛡 The Guardian uses Prodi.gy
🍰 And many new libraries in the spaCy Universe

mailchi.mp/spacy/tailored… We released spaCy v3.2, which improved performance for spaCy on Apple M1 and Nvidia GPU, added Doc input for pipelines, and provided registered scoring functions.

explosion.ai/blog/spacy-v3-2
Dec 16, 2021 10 tweets 4 min read
How to build a spaCy v3 pipeline to analyze 1m reviews about health aspects of supplements?

A thread that takes a closer look at the steps needed!

Pipeline
1️⃣ Named Entity Recognition
2️⃣ Segmentation
3️⃣ Blinding Entities
4️⃣ Text Classification

explosion.ai/blog/healthsea Healthsea uses Named Entity Recognition to detect health aspects in text, such as diseases or symptoms.

NER is the task of identifying non-overlapping spans like proper nouns and similar expressions:

spacy.io/usage/linguist…
Jul 12, 2019 13 tweets 8 min read
📺 THE VIDEOS FROM #spaCyIRL ARE NOW LIVE! And we don't mind saying, they turned out great. Here's 12 talks about NLP research, development and applications. Summaries and links in the thread 👇 youtube.com/playlist?list=… Transfer learning has been the big topic in NLP for 2018 and 2019. @seb_ruder opened the conference with how the field has been changing, what it means for OSS, and what could be improved.