We are R-Ladies Profile picture
Sep 17 โ€ข 21 tweets โ€ข 14 min read
The curation week is almost over and I would like to thank everyone for joining the discussions this week! Itโ€™s been a blast ๐Ÿฅณ
If you enjoyed this week, feel free to reach out on Twitter (@cosima_meyer) or GitHub (github.com/cosimameyer/) โœจ
@cosima_meyer I feel very honored that I had the chance to talk with you about the things I enjoy doing and I cannot wait to learn more from the upcoming curators - the lineup looks amazing! ๐Ÿ’œ
@cosima_meyer If you missed a Twitter thread this week, head over to @pilizalde's amazing thread where she collected all of them (I love the GitHub emoji ๐Ÿ˜บ)

๐Ÿ‘‡
@cosima_meyer @pilizalde But before I leave and head over to do some serious vinyl shopping ๐ŸŽถ, I want to talk about #NLP with you ๐Ÿ’ฌ
@cosima_meyer @pilizalde ๐Ÿ’ก What is NLP?
NLP is short for Natural Language Processing and it helps make sense of a difficult data type: written text.
@cosima_meyer @pilizalde ๐Ÿ“‘ But let's first start with the basic concepts and add a Gilmore Girls flavor to it - because who would be a better use case than those who have a seemingly endless vocabulary?
@cosima_meyer @pilizalde Here are all the terms that you need to know to get started ๐Ÿ‘‡ Image showing a visual overview of terms and concepts explai
@cosima_meyer @pilizalde โœจ Corpus: When you have your text data ready, you have your corpus. It's a collection of documents.
@cosima_meyer @pilizalde โœจ Tokens: Define each word in a text (but it could also be a sentence, paragraph, or character).
@cosima_meyer @pilizalde โœจ Tokenization: When you hear the word tokenization, it means that you are splitting up the sentences into single words (tokens) and turning them into a bag of words. You can take this quite literally - a bag of words does not really take the order of the words into account.
@cosima_meyer @pilizalde There are ways to account for the order using n-grams (so for instance a bigram would leave the sentence "Rory lives in a world of books" as "Rory lives", "lives in", "in a", "a world", "world of", "of books") but it's limited.
@cosima_meyer @pilizalde โœจ Document-feature matrix (DFM): To generate the DFM you first split the text into its single terms (tokens), then count how frequently each token occurs in each document.
@cosima_meyer @pilizalde โœจ Stemming: With stemming, you are getting the stem of the word.
@cosima_meyer @pilizalde โœจ Lemmatization: With lemmatization, it's slightly different. Instead of "stud" (which would probably be the stem of the study terms), you end up with a meaningful stem - "study" ๐Ÿฅณ
@cosima_meyer @pilizalde The overview also describes a typical workflow with the bags-of-word approach nicely.

๐Ÿ‘‰ You typically load the #data,

๐Ÿ‘‰ tokenize it (and turn it into a bag full of words),

๐Ÿ‘‰ pre-process it by stemming it (and removing stop words and a bit more)
@cosima_meyer @pilizalde ๐Ÿ‘‰ count the single words to turn the count into a DFM (document-feature matrix) - and now you're ready to go! ๐ŸŽ‰
@cosima_meyer @pilizalde From here on, you can do multiple tasks - for instance, you can perform #supervised tasks with dictionary approaches or classify the sentiment or topics. But you can also use it to perform #unsupervised tasks like structural topic models. The possibilities are almost endless.
@cosima_meyer @pilizalde If you're up for more on how to use ๐Ÿ“ฆ {quanteda} in #rstats on these tasks, here is more from a hands-on workshop that I had the honor to give @RLadiesBergen

I deployed the code in a readable and downloadable #Rmd file for you to use ๐Ÿ‘‡
@cosima_meyer @pilizalde @RLadiesBergen It contains everything from detailed terms and concepts description over data preparation to using supervised and unsupervised approaches.

๐Ÿ’ป The deployed code can be accessed here: nlp-bergen.netlify.app
And if you want me to talk you through the code, here's a recording and the slides to flip through ๐Ÿค“

๐Ÿ“บ @RLadiesBergen:

๐Ÿ“‘ Slides: cosimameyer.com/slides/nlp-rlaโ€ฆ
@RLadiesBergen If you're up for more verbose, it's also based on a blog post by @cbpuschmann and myself:
๐Ÿ“‘ @mzes_ssdl: bit.ly/text-mining-quโ€ฆ

โ€ข โ€ข โ€ข

Missing some Tweet in this thread? You can try to force a refresh
ใ€€

Keep Current with We are R-Ladies

We are R-Ladies Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @WeAreRLadies

Sep 17
And in my last Twitter thread, I wanted to talk with you about some powerful approaches in #NLP and how we can use both #rstats and #python to unleash them ๐Ÿ’ช
One possible downside when using the bag of words approach described before is that you often cannot fully take the structure of the language into account (n-grams are one way, but they are often limited).
You also often need many data to successfully train your model - which can be time-consuming and labor intensive. An alternative is to use a pre-trained model. And here comes @Google's famous deep learning model: BERT.
Read 19 tweets
Sep 16
๐Ÿ“ If you also keep thinking about brains and bodies, here is more of it to summarize the key points ๐Ÿค“ #rstats A visual summary of ShinyApps  left side:  User interface (b
๐Ÿ—‚ And as announced at #rstudioconf2022, you can now also build ShinyApps in #python! @Rami_Krispin set up a great repository that shows you how to set up your ShinyApp in Python using #shinyelive: github.com/RamiKrispin/shโ€ฆ ๐Ÿ’ป
@Rami_Krispin ๐Ÿ“– If you're up for more input on ShinyApps, here's the bible of Shiny: mastering-shiny.org
Read 6 tweets
Sep 16
๐Ÿ’ก What is reactivity and what does it have to do with a carrier pigeon? ๐Ÿฆ

To better understand how a #ShinyApp works, it's good to understand what's behind reactivity. Infographic showing reactiv...
To describe it, I love the image of a carrier pigeon ๐Ÿฆ (I picked up this idea when reading a post by @StatGarrett - so all credits go to him and all errors are mine โœจ)
@StatGarrett What reactivity does is "a magic trick [that] creates the illusion that one thing is happening, when in fact something else is going on" (shiny.rstudio.com/articles/underโ€ฆ).
Read 9 tweets
Sep 16
๐Ÿ‘ฉ๐Ÿผโ€๐Ÿ’ป How do you set up your own #ShinyApp?

It's easy in #rstats! Start a new #Rproject and select "Shiny Application". It will create a project with an "app.R" file for you โœจ Snapshot showing RStudio Project Wizard where "Shiny Ap
Once it's open, you can replace the code that is already in the "app.R" file with this code snippet below๐Ÿ‘‡ It does all the magic and shows how you can build a simple #ShinyApp ๐Ÿ”ฎ

(it's here for you to try: bit.ly/shinyapp-test) Screenshot showing a ShinyApp with selection boxes left (to
What the ShinyApp does:

You have checkboxes on the left side that let you choose countries (it's the ISO3 abbreviation, so "RWA" stands for Rwanda) and, depending on what you selected, your #ShinyApp will show a (non-realistic) population size for each country in a new plot.
Read 24 tweets
Sep 16
Today, we'll discover how you can use the power of #rstats to create an interactive #shinyapp โœจ
๐Ÿ’ก What is a ShinyApp?

Shiny is a framework that allows you to create web applications - ShinyApps โ˜บ๏ธ You can use them for multiple purposes - to visualize data ๐ŸŽจ (for instance the Scottish Household Survey by @ViktErik, bit.ly/3TqZevY, ... Scottish Household Survey showing a visualization of data in
Read 8 tweets
Sep 15
While I touched the surface of what you can do with #Git today, itโ€™s an extremely powerful tool that has so much more to offer ๐Ÿคฉ

Here are some more resources, if you want to learn more about it:

๐Ÿ“– happygitwithr.com

๐Ÿ“– atlassian.com/git (my go-to resource)
And there are of course fantastic #rladies talking about using #Git:

๐Ÿ“บ @RLadiesBmore: (by @lisalendway)
Read 5 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(