We are R-Ladies Profile picture
Sep 17 โ€ข 19 tweets โ€ข 12 min read
And in my last Twitter thread, I wanted to talk with you about some powerful approaches in #NLP and how we can use both #rstats and #python to unleash them ๐Ÿ’ช
One possible downside when using the bag of words approach described before is that you often cannot fully take the structure of the language into account (n-grams are one way, but they are often limited).
You also often need many data to successfully train your model - which can be time-consuming and labor intensive. An alternative is to use a pre-trained model. And here comes @Google's famous deep learning model: BERT.
@Google ๐Ÿ’ก What is BERT?

BERT is the acronym for bidirectional encoder representation from transformers. To understand how a BERT model works, I like to look at how it understands your text and how you train a BERT model.
@Google Simply speaking, there are three essential components:

โœจ First, with BERT, you identify the order of the input. Image showing three important components to know when traini
@Google You give the model information about different embedding layers (the tokens (BERT uses special tokens (for instance, [CLS] and [SEP]) to make sense of the sentence), the positional embedding (where each token is placed in the sentence),
@Google and the segment embedding (which gives you more info about the sentences to which the tokens belong).
@Google And then there is the training:

๐Ÿ‹๐Ÿผโ€โ™€๏ธ The first half of the training involves masking world (Mask ML). During the training period, you mask one word at a time and the model learns, which word usually follows.
@Google ๐Ÿ‹๐Ÿผโ€โ™€๏ธ During the second half, you train the model to predict the next sentence. This way, the model learns which sentences usually follow each other.
@Google If you want more, there are so many great videos online explaining how a BERT model works ( ๐Ÿ“บ bit.ly/bert-deep-learโ€ฆ - it's all about attention โœจ) but what's important to remember: it comes with a pre-trained language model.
@Google You can now use your labeled data to fine-tune the pre-trained model. One way of thinking about it is to think of a student: with the classical bag of words approach, you need to teach the student the language first. ๐Ÿ‘ฉ๐Ÿผโ€๐Ÿซ
@Google With BERT, you have a student who already knows the language but you're teaching the student a specific topic like biology. ๐Ÿงฌ
@Google There is a fantastic framework in #python to work with BERT models - ๐Ÿค— @huggingface: huggingface.co. It has several pre-trained models available on the website including fantastic tutorials that are very easy to follow.
@Google @huggingface If you're up for using it in #rstats - the great community got you covered. There is a tutorial for everything - and here's one from @rstudio explaining how to use the framework in R with ๐Ÿ“ฆ {reticulate}: blogs.rstudio.com/ai/posts/2020-โ€ฆ
@Google @huggingface @rstudio And another cool thing: Once you understand how BERT works, you can also apply the logic to a variety of text, audio, or video data tasks: huggingface.co/tasks ๐Ÿฅณ
@Google @huggingface @rstudio ๐Ÿ“ As the last wrap-up for this week, here's the visual of two possible workflows - comparing bag of words and BERT. They're not so much different as you will see. One approach uses a bag of words and the other approach uses #BERT. Image showing two different workflows (Bag of words and BERT
@Google @huggingface @rstudio With both approaches, you load your data but with #BERT you get a specifically pre-trained tokenizer (to make sense of your words) and model as well. You pre-process your text in both cases and then make use of it to perform your task at hand ๐Ÿ‘ฉ๐Ÿผโ€๐Ÿ’ป
@Google @huggingface @rstudio If you want more in-depth insights into how to do it, there's more:

๐Ÿ‘‰ bag of words and {quanteda} ๐Ÿ“ฆ: bit.ly/text-mining-quโ€ฆ
@Google @huggingface @rstudio ๐Ÿ‘‰ BERT with the ๐Ÿค— @huggingface framework - particularly the tokenizer pipeline (bit.ly/hf-tokenizer) and the tutorial on fine-tuning (bit.ly/hf-fine-tune) are great to get you started! โœจ

โ€ข โ€ข โ€ข

Missing some Tweet in this thread? You can try to force a refresh
ใ€€

Keep Current with We are R-Ladies

We are R-Ladies Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @WeAreRLadies

Sep 17
The curation week is almost over and I would like to thank everyone for joining the discussions this week! Itโ€™s been a blast ๐Ÿฅณ
If you enjoyed this week, feel free to reach out on Twitter (@cosima_meyer) or GitHub (github.com/cosimameyer/) โœจ
@cosima_meyer I feel very honored that I had the chance to talk with you about the things I enjoy doing and I cannot wait to learn more from the upcoming curators - the lineup looks amazing! ๐Ÿ’œ
@cosima_meyer If you missed a Twitter thread this week, head over to @pilizalde's amazing thread where she collected all of them (I love the GitHub emoji ๐Ÿ˜บ)

๐Ÿ‘‡
Read 21 tweets
Sep 16
๐Ÿ“ If you also keep thinking about brains and bodies, here is more of it to summarize the key points ๐Ÿค“ #rstats A visual summary of ShinyApps  left side:  User interface (b
๐Ÿ—‚ And as announced at #rstudioconf2022, you can now also build ShinyApps in #python! @Rami_Krispin set up a great repository that shows you how to set up your ShinyApp in Python using #shinyelive: github.com/RamiKrispin/shโ€ฆ ๐Ÿ’ป
@Rami_Krispin ๐Ÿ“– If you're up for more input on ShinyApps, here's the bible of Shiny: mastering-shiny.org
Read 6 tweets
Sep 16
๐Ÿ’ก What is reactivity and what does it have to do with a carrier pigeon? ๐Ÿฆ

To better understand how a #ShinyApp works, it's good to understand what's behind reactivity. Infographic showing reactiv...
To describe it, I love the image of a carrier pigeon ๐Ÿฆ (I picked up this idea when reading a post by @StatGarrett - so all credits go to him and all errors are mine โœจ)
@StatGarrett What reactivity does is "a magic trick [that] creates the illusion that one thing is happening, when in fact something else is going on" (shiny.rstudio.com/articles/underโ€ฆ).
Read 9 tweets
Sep 16
๐Ÿ‘ฉ๐Ÿผโ€๐Ÿ’ป How do you set up your own #ShinyApp?

It's easy in #rstats! Start a new #Rproject and select "Shiny Application". It will create a project with an "app.R" file for you โœจ Snapshot showing RStudio Project Wizard where "Shiny Ap
Once it's open, you can replace the code that is already in the "app.R" file with this code snippet below๐Ÿ‘‡ It does all the magic and shows how you can build a simple #ShinyApp ๐Ÿ”ฎ

(it's here for you to try: bit.ly/shinyapp-test) Screenshot showing a ShinyApp with selection boxes left (to
What the ShinyApp does:

You have checkboxes on the left side that let you choose countries (it's the ISO3 abbreviation, so "RWA" stands for Rwanda) and, depending on what you selected, your #ShinyApp will show a (non-realistic) population size for each country in a new plot.
Read 24 tweets
Sep 16
Today, we'll discover how you can use the power of #rstats to create an interactive #shinyapp โœจ
๐Ÿ’ก What is a ShinyApp?

Shiny is a framework that allows you to create web applications - ShinyApps โ˜บ๏ธ You can use them for multiple purposes - to visualize data ๐ŸŽจ (for instance the Scottish Household Survey by @ViktErik, bit.ly/3TqZevY, ... Scottish Household Survey showing a visualization of data in
Read 8 tweets
Sep 15
While I touched the surface of what you can do with #Git today, itโ€™s an extremely powerful tool that has so much more to offer ๐Ÿคฉ

Here are some more resources, if you want to learn more about it:

๐Ÿ“– happygitwithr.com

๐Ÿ“– atlassian.com/git (my go-to resource)
And there are of course fantastic #rladies talking about using #Git:

๐Ÿ“บ @RLadiesBmore: (by @lisalendway)
Read 5 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(