Tweet

We are R-Ladies

Sep 17 • 19 tweets • 12 min read

And in my last Twitter thread, I wanted to talk with you about some powerful approaches in #NLP and how we can use both #rstats and #python to unleash them 💪

One possible downside when using the bag of words approach described before is that you often cannot fully take the structure of the language into account (n-grams are one way, but they are often limited).

@Google

You also often need many data to successfully train your model - which can be time-consuming and labor intensive. An alternative is to use a pre-trained model. And here comes @Google's famous deep learning model: BERT.

@Google

@Google 💡 What is BERT?

BERT is the acronym for bidirectional encoder representation from transformers. To understand how a BERT model works, I like to look at how it understands your text and how you train a BERT model.

@Google

@Google Simply speaking, there are three essential components:

✨ First, with BERT, you identify the order of the input.

@Google

@Google You give the model information about different embedding layers (the tokens (BERT uses special tokens (for instance, [CLS] and [SEP]) to make sense of the sentence), the positional embedding (where each token is placed in the sentence),

@Google

@Google and the segment embedding (which gives you more info about the sentences to which the tokens belong).

@Google

@Google And then there is the training:

🏋🏼‍♀️ The first half of the training involves masking world (Mask ML). During the training period, you mask one word at a time and the model learns, which word usually follows.

@Google

@Google 🏋🏼‍♀️ During the second half, you train the model to predict the next sentence. This way, the model learns which sentences usually follow each other.

@Google

@Google If you want more, there are so many great videos online explaining how a BERT model works ( 📺 bit.ly/bert-deep-lear… - it's all about attention ✨) but what's important to remember: it comes with a pre-trained language model.

@Google

@Google You can now use your labeled data to fine-tune the pre-trained model. One way of thinking about it is to think of a student: with the classical bag of words approach, you need to teach the student the language first. 👩🏼‍🏫

@Google

@Google With BERT, you have a student who already knows the language but you're teaching the student a specific topic like biology. 🧬

@Google

@Google There is a fantastic framework in #python to work with BERT models - 🤗 @huggingface: huggingface.co. It has several pre-trained models available on the website including fantastic tutorials that are very easy to follow.

@Google

@Google @huggingface If you're up for using it in #rstats - the great community got you covered. There is a tutorial for everything - and here's one from @rstudio explaining how to use the framework in R with 📦 {reticulate}: blogs.rstudio.com/ai/posts/2020-…

@Google

@Google @huggingface @rstudio And another cool thing: Once you understand how BERT works, you can also apply the logic to a variety of text, audio, or video data tasks: huggingface.co/tasks 🥳

@Google

@Google @huggingface @rstudio 📝 As the last wrap-up for this week, here's the visual of two possible workflows - comparing bag of words and BERT. They're not so much different as you will see. One approach uses a bag of words and the other approach uses #BERT.

@Google

@Google @huggingface @rstudio With both approaches, you load your data but with #BERT you get a specifically pre-trained tokenizer (to make sense of your words) and model as well. You pre-process your text in both cases and then make use of it to perform your task at hand 👩🏼‍💻

@Google

@Google @huggingface @rstudio If you want more in-depth insights into how to do it, there's more:

👉 bag of words and {quanteda} 📦: bit.ly/text-mining-qu…

@Google

@Google @huggingface @rstudio 👉 BERT with the 🤗 @huggingface framework - particularly the tokenizer pipeline (bit.ly/hf-tokenizer) and the tutorial on fine-tuning (bit.ly/hf-fine-tune) are great to get you started! ✨

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @WeAreRLadies

We are R-Ladies

@WeAreRLadies

Sep 17

@cosima_meyer

The curation week is almost over and I would like to thank everyone for joining the discussions this week! It’s been a blast 🥳
If you enjoyed this week, feel free to reach out on Twitter (@cosima_meyer) or GitHub (github.com/cosimameyer/) ✨

@cosima_meyer

@cosima_meyer I feel very honored that I had the chance to talk with you about the things I enjoy doing and I cannot wait to learn more from the upcoming curators - the lineup looks amazing! 💜

@cosima_meyer

@cosima_meyer If you missed a Twitter thread this week, head over to @pilizalde's amazing thread where she collected all of them (I love the GitHub emoji 😺)

👇

https://twitter.com/pilizalde/status/1570741209806798854

Read 21 tweets

We are R-Ladies

@WeAreRLadies

Sep 16

📝 If you also keep thinking about brains and bodies, here is more of it to summarize the key points 🤓 #rstats

@Rami_Krispin

🗂 And as announced at #rstudioconf2022, you can now also build ShinyApps in #python! @Rami_Krispin set up a great repository that shows you how to set up your ShinyApp in Python using #shinyelive: github.com/RamiKrispin/sh… 💻

@Rami_Krispin

@Rami_Krispin 📖 If you're up for more input on ShinyApps, here's the bible of Shiny: mastering-shiny.org

Read 6 tweets

We are R-Ladies

@WeAreRLadies

Sep 16

💡 What is reactivity and what does it have to do with a carrier pigeon? 🐦

To better understand how a #ShinyApp works, it's good to understand what's behind reactivity.

@StatGarrett

To describe it, I love the image of a carrier pigeon 🐦 (I picked up this idea when reading a post by @StatGarrett - so all credits go to him and all errors are mine ✨)

@StatGarrett

@StatGarrett What reactivity does is "a magic trick [that] creates the illusion that one thing is happening, when in fact something else is going on" (shiny.rstudio.com/articles/under…).

Read 9 tweets

We are R-Ladies

@WeAreRLadies

Sep 16

👩🏼‍💻 How do you set up your own #ShinyApp?

It's easy in #rstats! Start a new #Rproject and select "Shiny Application". It will create a project with an "app.R" file for you ✨

Once it's open, you can replace the code that is already in the "app.R" file with this code snippet below👇 It does all the magic and shows how you can build a simple #ShinyApp 🔮

(it's here for you to try: bit.ly/shinyapp-test)

What the ShinyApp does:

You have checkboxes on the left side that let you choose countries (it's the ISO3 abbreviation, so "RWA" stands for Rwanda) and, depending on what you selected, your #ShinyApp will show a (non-realistic) population size for each country in a new plot.

Read 24 tweets

We are R-Ladies

@WeAreRLadies

Sep 16

Today, we'll discover how you can use the power of #rstats to create an interactive #shinyapp ✨

@ViktErik

💡 What is a ShinyApp?

Shiny is a framework that allows you to create web applications - ShinyApps ☺️ You can use them for multiple purposes - to visualize data 🎨 (for instance the Scottish Household Survey by @ViktErik, bit.ly/3TqZevY, ...

@ViktErik

@ViktErik ... or the #SDG by @Shel_Kariuki, bit.ly/sdg-shiny), ...

Read 8 tweets

We are R-Ladies

@WeAreRLadies

Sep 15

While I touched the surface of what you can do with #Git today, it’s an extremely powerful tool that has so much more to offer 🤩

Here are some more resources, if you want to learn more about it:

📖 happygitwithr.com

📖 atlassian.com/git (my go-to resource)