1/10: 🧪📊 Introducing Generalized Linear Models (GLMs) and how to perform them using R! A thread. #GLM#Rstats#DataScience
2/10: 💡GLMs are a general class of regression models that extend linear regression, allowing for a variety of response distributions & link functions. They're used for modeling relationships between a response variable & one or more explanatory variables. #RStats#DataScience
3/10: 📐The main components of a GLM are:
Random Component: The response variable's distribution (e.g., Gaussian, Poisson, Binomial)
Systematic Component: Linear predictor (linear combo of explanatory variables)
Link Function: Connects the two components. #Rstats#DataScience
4/10: 🧩 To perform a GLM in R, use the glm() function. It requires two main arguments: a formula specifying the model and a family argument specifying the distribution & link function. #RStats#DataScience
5/10: 🔧 Example: modeling count data (Poisson distribution) with a log link function.
model <- glm(response ~ explanatory_variable, family = poisson(link = "log"), data = dataset)
Replace response, explanatory_variable, and dataset with your own data. #Rstats#DataScience
6/10: 🔍 Check your model's goodness-of-fit using summary() and AIC() functions to evaluate the residual deviance and Akaike Information Criterion, respectively. #RStats#DataScience
7/10: 📈 To visualize the relationship between the response variable and the explanatory variables, use plot() and lines() functions in R. #DataViz#RStats#DataScience
9/10: 💡 Remember, model selection and interpretation should be based on a strong understanding of the data, domain knowledge, and appropriate statistical techniques. Always validate your models using cross-validation or other methods. #RStats#DataScience
10/10: 🚀 That's it! You're now ready to explore the world of Generalized Linear Models using R. Good luck, and happy modeling! 🎉 #GLM#Rstats#DataScience
• • •
Missing some Tweet in this thread? You can try to
force a refresh
🧵1/10 Multivariate Normality: A Guide to Controlling It Using the MVN Package in R 📊
Multivariate normality is a key assumption in many statistical techniques. Let’s explore how to test and control for MVN using the MVN package in R. #rstats#DataScience cran.r-project.org/web/packages/M…
🧵2/10 Installing the MVN Package 📦
First, let's install and load the MVN package in R:
install.packages("MVN")
library(MVN)
This package offers a range of functions to assess and visualize multivariate normality. #rstats#DataScience
🧵3/10 Testing for MVN 🔎
To test your dataset for MVN, use the mvn() function. The function takes a data frame or a matrix as input and returns a list of tests, including Mardia's, Henze-Zirkler's, and Royston's tests.
# setosa subset of the Iris data
setosa <- iris[1:50, 1:4]
🧵1/16 🚀 Package Exploration! 🌌 We all know popular #RStats packages like ggplot2, dplyr, and shiny, but there are tons of hidden gems 💎 in the CRAN universe waiting to be discovered! Let's explore some lesser-known packages that can supercharge your #DataScience journey!
🧵2/16 🌈 colorfindr: This nifty package extracts the most common colors from your images! Whether you're working with visualizations, web design, or marketing materials, colorfindr has you covered. Check it out: cran.r-project.org/package=colorf…#RStats#DataScience
🧵3/16 📦 pacman: Tired of typing install.packages() for each new package you need? pacman is here to save the day! It's a Swiss Army knife 🛠️ for package management, making installing, loading, and updating packages a breeze! cran.r-project.org/package=pacman#RStats#DataScience
[1/8] 📚 Introducing #Quarto: A Versatile, New and Exciting Publishing Tool! 🌟
Quarto is a powerful, open-source, and user-friendly publishing framework that streamlines the process of creating beautiful books, documents, and websites. Let’s explore it now! #RStats#DataScience
[2/8] 🤓 Language Agnostic: Quarto works seamlessly with multiple languages, including #Markdown, #LaTeX, #RMarkdown, and #Jupyter notebooks. So, whether you're a researcher or a creative writer, Quarto has you covered! 🌍 #DataScience#RStats
[3/8] 🔁 Format Flexibility: With Quarto, you can convert your content into various formats, such as PDF, HTML, EPUB, and even slide presentations. It makes sharing your work with diverse audiences a breeze! 🌬️ #RStats#DataScience
🧵1/9 A deep dive into the history of #Backpropagation: A key technique in training multilayer architectures for neural networks. This powerful method revolutionized the way we train AI systems, leading to major breakthroughs in various domains. 🤖#DataScience#DeepLearning#AI
🧵2/9 #Backpropagation is based on a simple concept: use gradient descent to optimize multilayer networks. By applying the chain rule for derivatives, it computes gradients efficiently, leading to optimized weight configurations in each layer of the network. #DataScience#AI
🧵3/9 The shift to Rectified Linear Units (ReLU) accelerated learning in deep networks, allowing training without unsupervised pre-training. This non-linear activation function proved more effective than its smoother predecessors like tanh(z) or 1/(1+exp(−z)). #ReLU#DataScience
🧵1/8 Loading datasets from various sources is crucial for data analysis. In this thread, we'll explore how to read datasets from different sources and software using R! 📚 #RStats#DataScience
🧵2/8 CSV Files: The "read.csv" function is a go-to for reading comma-separated values files. For improved performance and more flexibility, consider using the "read_csv" function from the readr package or the fread function from the data.table package. 📃 #CSV#RStats
🧵3/8 Excel Files: The readxl package provides functions like "read_excel" for reading data from Excel files (.xls and .xlsx). Alternatively, the openxlsx package offers more features, including reading and writing Excel files. 📊 #Excel#RStats
Thread: (1/9) You might have heard the term 'bootstrapping' thrown around in discussions about statistics, data analysis, or machine learning. But what does it mean, and why is it so powerful? Let's break it down in simple terms! #RStats#DataScience
(2/9) Bootstrapping is a resampling technique that involves taking multiple samples from the original dataset, each time with replacement. It's like drawing marbles from a bag, putting each one back after recording its color. This helps us understand the uncertainty in our data.
(3/9) In real-life situations, it's not always feasible to collect more data. Bootstrapping allows us to make the most of what we have, creating a 'pseudo-replica' of our dataset through resampling. This helps us understand the variability of our estimates. #RStats#DataScience