Tweet

We are R-Ladies

Sep 17 • 21 tweets • 14 min read

@cosima_meyer

The curation week is almost over and I would like to thank everyone for joining the discussions this week! It’s been a blast 🥳
If you enjoyed this week, feel free to reach out on Twitter (@cosima_meyer) or GitHub (github.com/cosimameyer/) ✨

@cosima_meyer

@cosima_meyer I feel very honored that I had the chance to talk with you about the things I enjoy doing and I cannot wait to learn more from the upcoming curators - the lineup looks amazing! 💜

@cosima_meyer

@cosima_meyer If you missed a Twitter thread this week, head over to @pilizalde's amazing thread where she collected all of them (I love the GitHub emoji 😺)

👇

https://twitter.com/pilizalde/status/1570741209806798854

@cosima_meyer

@cosima_meyer @pilizalde But before I leave and head over to do some serious vinyl shopping 🎶, I want to talk about #NLP with you 💬

@cosima_meyer

@cosima_meyer @pilizalde 💡 What is NLP?
NLP is short for Natural Language Processing and it helps make sense of a difficult data type: written text.

@cosima_meyer

@cosima_meyer @pilizalde 📑 But let's first start with the basic concepts and add a Gilmore Girls flavor to it - because who would be a better use case than those who have a seemingly endless vocabulary?

@cosima_meyer

@cosima_meyer @pilizalde Here are all the terms that you need to know to get started 👇

@cosima_meyer

@cosima_meyer @pilizalde ✨ Corpus: When you have your text data ready, you have your corpus. It's a collection of documents.

@cosima_meyer

@cosima_meyer @pilizalde ✨ Tokens: Define each word in a text (but it could also be a sentence, paragraph, or character).

@cosima_meyer

@cosima_meyer @pilizalde ✨ Tokenization: When you hear the word tokenization, it means that you are splitting up the sentences into single words (tokens) and turning them into a bag of words. You can take this quite literally - a bag of words does not really take the order of the words into account.

@cosima_meyer

@cosima_meyer @pilizalde There are ways to account for the order using n-grams (so for instance a bigram would leave the sentence "Rory lives in a world of books" as "Rory lives", "lives in", "in a", "a world", "world of", "of books") but it's limited.

@cosima_meyer

@cosima_meyer @pilizalde ✨ Document-feature matrix (DFM): To generate the DFM you first split the text into its single terms (tokens), then count how frequently each token occurs in each document.

@cosima_meyer

@cosima_meyer @pilizalde ✨ Stemming: With stemming, you are getting the stem of the word.

@cosima_meyer

@cosima_meyer @pilizalde ✨ Lemmatization: With lemmatization, it's slightly different. Instead of "stud" (which would probably be the stem of the study terms), you end up with a meaningful stem - "study" 🥳

@cosima_meyer

@cosima_meyer @pilizalde The overview also describes a typical workflow with the bags-of-word approach nicely.

👉 You typically load the #data,

👉 tokenize it (and turn it into a bag full of words),

👉 pre-process it by stemming it (and removing stop words and a bit more)

@cosima_meyer

@cosima_meyer @pilizalde 👉 count the single words to turn the count into a DFM (document-feature matrix) - and now you're ready to go! 🎉

@cosima_meyer

@cosima_meyer @pilizalde From here on, you can do multiple tasks - for instance, you can perform #supervised tasks with dictionary approaches or classify the sentiment or topics. But you can also use it to perform #unsupervised tasks like structural topic models. The possibilities are almost endless.

@cosima_meyer

@cosima_meyer @pilizalde If you're up for more on how to use 📦 {quanteda} in #rstats on these tasks, here is more from a hands-on workshop that I had the honor to give @RLadiesBergen

I deployed the code in a readable and downloadable #Rmd file for you to use 👇

@cosima_meyer

@cosima_meyer @pilizalde @RLadiesBergen It contains everything from detailed terms and concepts description over data preparation to using supervised and unsupervised approaches.

💻 The deployed code can be accessed here: nlp-bergen.netlify.app

@RLadiesBergen

And if you want me to talk you through the code, here's a recording and the slides to flip through 🤓

📺 @RLadiesBergen:

📑 Slides: cosimameyer.com/slides/nlp-rla…

@RLadiesBergen

@RLadiesBergen If you're up for more verbose, it's also based on a blog post by @cbpuschmann and myself:
📑 @mzes_ssdl: bit.ly/text-mining-qu…

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @WeAreRLadies

We are R-Ladies

@WeAreRLadies

Sep 17

And in my last Twitter thread, I wanted to talk with you about some powerful approaches in #NLP and how we can use both #rstats and #python to unleash them 💪

One possible downside when using the bag of words approach described before is that you often cannot fully take the structure of the language into account (n-grams are one way, but they are often limited).

@Google

You also often need many data to successfully train your model - which can be time-consuming and labor intensive. An alternative is to use a pre-trained model. And here comes @Google's famous deep learning model: BERT.

Read 19 tweets

We are R-Ladies

@WeAreRLadies

Sep 16

📝 If you also keep thinking about brains and bodies, here is more of it to summarize the key points 🤓 #rstats

@Rami_Krispin

🗂 And as announced at #rstudioconf2022, you can now also build ShinyApps in #python! @Rami_Krispin set up a great repository that shows you how to set up your ShinyApp in Python using #shinyelive: github.com/RamiKrispin/sh… 💻

@Rami_Krispin

@Rami_Krispin 📖 If you're up for more input on ShinyApps, here's the bible of Shiny: mastering-shiny.org

Read 6 tweets

We are R-Ladies

@WeAreRLadies

Sep 16

💡 What is reactivity and what does it have to do with a carrier pigeon? 🐦

To better understand how a #ShinyApp works, it's good to understand what's behind reactivity.

@StatGarrett

To describe it, I love the image of a carrier pigeon 🐦 (I picked up this idea when reading a post by @StatGarrett - so all credits go to him and all errors are mine ✨)

@StatGarrett

@StatGarrett What reactivity does is "a magic trick [that] creates the illusion that one thing is happening, when in fact something else is going on" (shiny.rstudio.com/articles/under…).

Read 9 tweets

We are R-Ladies

@WeAreRLadies

Sep 16

👩🏼‍💻 How do you set up your own #ShinyApp?

It's easy in #rstats! Start a new #Rproject and select "Shiny Application". It will create a project with an "app.R" file for you ✨

Once it's open, you can replace the code that is already in the "app.R" file with this code snippet below👇 It does all the magic and shows how you can build a simple #ShinyApp 🔮

(it's here for you to try: bit.ly/shinyapp-test)

What the ShinyApp does:

You have checkboxes on the left side that let you choose countries (it's the ISO3 abbreviation, so "RWA" stands for Rwanda) and, depending on what you selected, your #ShinyApp will show a (non-realistic) population size for each country in a new plot.

Read 24 tweets

We are R-Ladies

@WeAreRLadies

Sep 16

Today, we'll discover how you can use the power of #rstats to create an interactive #shinyapp ✨

@ViktErik

💡 What is a ShinyApp?

Shiny is a framework that allows you to create web applications - ShinyApps ☺️ You can use them for multiple purposes - to visualize data 🎨 (for instance the Scottish Household Survey by @ViktErik, bit.ly/3TqZevY, ...

@ViktErik

@ViktErik ... or the #SDG by @Shel_Kariuki, bit.ly/sdg-shiny), ...

Read 8 tweets

We are R-Ladies

@WeAreRLadies

Sep 15

While I touched the surface of what you can do with #Git today, it’s an extremely powerful tool that has so much more to offer 🤩

Here are some more resources, if you want to learn more about it:

📖 happygitwithr.com

📖 atlassian.com/git (my go-to resource)