Discover and read the best of Twitter Threads about #statistics

Most recents (24)

Hi #EconTwitter!

Interested in the #econometrics of ๐œ๐š๐ฎ๐ฌ๐š๐ฅ ๐ข๐ง๐Ÿ๐ž๐ซ๐ž๐ง๐œ๐ž? ๐Ÿ“ˆ

Here's a thread with some great lecture notes, slides, and books on this topic - freely available on the web - which I've happily tweeted about! ๐Ÿ“š๐ŸŒ

Hope you'll find it useful! ImageImage
๐๐š๐ฎ๐ฅ ๐†๐จ๐ฅ๐๐ฌ๐ฆ๐ข๐ญ๐ก-๐๐ข๐ง๐ค๐ก๐š๐ฆ'๐ฌ (@paulgp) ๐€๐ฉ๐ฉ๐ฅ๐ข๐ž๐ ๐„๐ฆ๐ฉ๐ข๐ซ๐ข๐œ๐š๐ฅ ๐Œ๐ž๐ญ๐ก๐จ๐๐ฌ

PhD-level slides covering econometrics, ย from causal inference to applied MachineLearning.

github.com/paulgp/appliedโ€ฆ
๐๐ž๐ง๐  ๐ƒ๐ข๐ง๐ '๐ฌ (@pengding00) ๐‚๐š๐ฎ๐ฌ๐š๐ฅ ๐ข๐ง๐Ÿ๐ž๐ซ๐ž๐ง๐œ๐ž

Undergrad notes about the #statistics and #econometrics of causal inference.

arxiv.org/abs/2305.18793
Read 11 tweets
Unveiling the shocking truth behind #Bihar's notorious nickname, #JungleRaaj. Brace yourself for a riveting thread ๐Ÿ‘‡๐Ÿ‘‡ that delves into the tumultuous history.
#JungleRaaj, the dark legacy of @laluprasadrjd 's 15-year rule in Bihar, where our beloved state Bihar plummeted to the rung bottom of economic and social rankings. Law and order crumbled, kidnappings soared, and gangsters turned politicians thrived.
August 5, 1997, the fateful day when during a hearing, #Patna High Court declared, '#Bihar me sarkar nahi, Bihar me jangal raaj hai.' A moment of truth that exposed a state plagued by lawlessness, where progress was devoured by chaos.
Read 11 tweets
1/ ๐Ÿ“Š๐Ÿ“ˆ Let's dive into the fascinating world of #statistics and explore two key concepts: Odds Ratio and Relative Risk! Understanding the differences and applications of these two measures is crucial for interpreting study results and making informed decisions. #DataScience Image
2/ ๐ŸŽฒ Odds Ratio (OR): The Odds Ratio is a measure of association between an exposure and an outcome. It represents the odds of an event occurring in one group compared to the odds in another group. OR is particularly useful in case-control studies. #DataScience
3/ ๐ŸŒก๏ธ Relative Risk (RR): Also known as Risk Ratio, RR is the ratio of the probability of an event occurring in the exposed group to the probability of the event occurring in the non-exposed group. RR is often used in cohort studies to assess risk. #DataScience
Read 8 tweets
1/ ๐Ÿ“Š๐Ÿ“ Let's dive into the world of #statistics & explore the Levels of Measurement! Understanding these levels is crucial for choosing the right statistical methods for data analysis. Today, we'll cover the 4 main levels: Nominal, Ordinal, Interval, and Ratio. #DataScience Image
2/ ๐Ÿท๏ธ Nominal Level: At this level, data is purely qualitative and categorical. There's no inherent order or ranking involved. Examples include colors, genders, or nationalities. It's important to note that mathematical operations like addition or subtraction don't apply here.
3/ ๐Ÿฅ‡๐Ÿฅˆ๐Ÿฅ‰ Ordinal Level: This level involves data that has an inherent order or ranking, but the difference between categories is not uniform. Examples include survey responses (Strongly Disagree to Strongly Agree) or educational levels (elementary, high school, college).
Read 9 tweets
๐Ÿงต1/10: Systematic vs. Random Error ๐ŸŽฏ

Welcome to this thread where we'll explore the differences between systematic and random error, two types of error that can impact the accuracy and precision of your data. Let's dive in! #Statistics #DataScience Image
๐Ÿงต2/10: Error in Measurements ๐Ÿ“

In any measurement process, there's a possibility of errors occurring. Understanding the types of errors that can arise helps us to design experiments that minimize their impact and improve the quality of our results. #DataScience
๐Ÿงต3/10: Systematic Error ๐Ÿ“

Systematic errors, or biases, are consistent and reproducible inaccuracies that occur in the same direction every time. These errors can be due to faulty equipment, incorrect calibration, or even observer bias. #DataScience
Read 10 tweets
๐Ÿงต1/8 ๐ŸŽฒ Ever heard of the Monte Carlo Simulation?

It's a powerful mathematical technique used to model complex systems, make predictions, and optimize decision-making. Let's dive into this fascinating world! #MonteCarloSimulation #Statistics #DataScience
๐Ÿงต2/8 How does it work? ๐Ÿค”

Monte Carlo Simulation uses random sampling and statistical models to estimate unknown values. It simulates a system multiple times with different random inputs and aggregates the results to produce predictions. #RandomSampling #DataScience
๐Ÿงต3/8 Applications ๐Ÿ’ผ

From finance to engineering, Monte Carlo Simulation is used across many fields. It helps with risk analysis, portfolio optimization, and even predicting the weather. The versatility of this method is truly remarkable. #DataScience
Read 8 tweets
[1/9] ๐ŸŽฒ Let's talk about the difference between probability and likelihood in #statistics. These two terms are often confused, but understanding their distinction is key for making sense of data analysis! #Rstats #DataScience Image
[2/9]๐Ÿ’กProbability is a measure of how likely a specific outcome is in a random process. It quantifies the degree of certainty we have about the occurrence of an event. It ranges from 0 (impossible) to 1 (certain). The sum of probabilities for all possible outcomes is always 1.
[3/9] ๐Ÿ“Š Likelihood, on the other hand, is a measure of how probable a particular set of observed data is, given a specific set of parameters for a statistical model. Likelihood is not a probability, but it shares the same mathematical properties (i.e., it's always non-negative).
Read 10 tweets
1/๐ŸงตโœจOccam's razor is a principle that states that the simplest explanation is often the best one. But did you know that it can also be applied to statistics? Let's dive into how Occam's razor helps us make better decisions in data analysis. #OccamsRazor #Statistics #DataScience
2/ ๐Ÿ“ Occam's razor is based on the idea of "parsimony" - the preference for simpler solutions. In statistics, this means choosing models that are less complex but still accurate in predicting outcomes. #Simplicity #DataScience
3/ ๐Ÿ“Š Overfitting is a common problem in statistics, where a model becomes too complex and captures noise rather than the underlying trend. Occam's razor helps us avoid overfitting by prioritizing simpler models with fewer parameters. #Overfitting #ModelSelection #DataScience
Read 6 tweets
๐Ÿงต1/10 - Law of Large Numbers (LLN) in R ๐Ÿ“ˆ

Hello #Rstats community! Today, we're going to explore the Law of Large Numbers (LLN), a fundamental concept in probability theory, and how to demonstrate it using R. Get ready for some code! ๐Ÿš€

#Probability #Statistics #DataScience Image
๐Ÿงต2/10 - What is LLN? ๐Ÿง

LLN states that as the number of trials (n) in a random experiment increases, the average of the outcomes converges to the expected value. In other words, the more we repeat an experiment, the closer we get to the true probability.

#RStats #DataScience
๐Ÿงต3/10 - Coin Flip Example ๐Ÿช™

Imagine flipping a fair coin. The probability of getting heads (H) is 0.5. As we increase the number of flips, the proportion of H should approach 0.5. Let's see this in action with R!

#RStats #DataScience
Read 11 tweets
1/๐Ÿงต Welcome to this thread on the Central Limit Theorem (CLT), a key concept in statistics! We'll cover what the CLT is, why it's essential, and how to demonstrate it using R. Grab a cup of coffee and let's dive in! โ˜•๏ธ #statistics #datascience #rstats Source: https://www.digital...
2/๐Ÿ“š The Central Limit Theorem states that the distribution of sample means approaches a normal distribution as the sample size (n) increases, given that the population has a finite mean and variance. It's a cornerstone of inferential statistics! #CLT #DataScience #RStats
3/๐Ÿ”‘ Why is the CLT important? It allows us to make inferences about population parameters using sample data. Since many statistical tests assume normality, CLT gives us the foundation to apply those tests even when the underlying population is not normally distributed. #RStats
Read 12 tweets
1/10: ๐Ÿงต Welcome to this thread on #regression modeling strategies in #R! We'll discuss key techniques and packages to help you build effective models. Ready to dive in? Let's go! ๐Ÿš€ #RStats #DataScience #Statistics Source: https://www.imsl.co...
2/10: ๐ŸŒ Linear Regression: Start with simple & multiple linear regression using 'lm()' function. Check out the 'broom' package for easy-to-use regression output! #RStats cran.r-project.org/web/packages/bโ€ฆ
3/10:๐Ÿž๏ธ Polynomial Regression: When data is nonlinear, try polynomial regression! Use 'poly()' to create higher-order terms. Beware of overfitting! #RStats
Read 10 tweets
Roadmap to becoming Data Analyst in three months absolutely free. No need to pay a penny for this.

I have mentioned a roadmap with free resources.

A thread๐Ÿงต๐Ÿ‘‡
1. First Month Foundations of Data Analysis

A. Corey Schafer - Python Tutorials for Beginners:
B. StatQuest with Josh Starmer - Statistics Fundamentals:
C. Ken Jee - Data Analysis with Python
2. Second Month - Advanced Data Analysis Techniques

A. Sentdex - Machine Learning with Python
B. StatQuest with Josh Starmer - Machine Learning Fundamentals
C. Brandon Foltz - Business Analytics
Read 6 tweets
Python project ideas for beginners with source code

A thread ๐Ÿงต๐Ÿ‘‡
1. Calculator App
Source Code Link: github.com/programiz/Calcโ€ฆ
2. Expense Tracker
Source Code Link: github.com/prtm/Expense-Tโ€ฆ
Read 7 tweets
Python for data science beginners roadmap

A thread ๐Ÿงต๐Ÿ‘‡
1. Python Basics
Codecademy's Python Course (codecademy.com/learn/learn-pyโ€ฆ)
Python for Everybody Course (py4e.com)
2. Data Analysis Libraries
NumPy User Guide (numpy.org/doc/stable/useโ€ฆ)
Pandas User Guide (pandas.pydata.org/docs/user_guidโ€ฆ)
Matplotlib Tutorials (matplotlib.org/stable/tutoriaโ€ฆ)
Read 7 tweets
Top 25 SQL Interview Questions and Answers

๐Ÿงต
There are certain SQL concepts which you should be familiar with if you plan to attend an #SQL interview. No matter which RDBMS you use wether it is MySQL, Oracle, Microsoft SQL Server, #PostgreSQL or any other, these SQL concepts are common for all of the popular RDBMS.
Read 5 tweets
THE FOLLOWING IS QUOTED FROM BERGAN EVANS
ON NORBERT #WEINER, NUCLEAR PHYSICIST !!!

The #second concept Wiener has to establish is that of entropy. Probability is a #mathematical concept, coming from #statistics.
Entropy comes from physics. It is the assertion-- established logically and experimentally-- that the universe, by its nature, is "running down", moving toward a state of inert uniformity devoid of form, matter, hierarchy or differentiation.....
That is, in any given situation, less organization, more #chaos, is overwhelmingly more probable than tighter #organization or more order.

Clear Consequences [Y]

Apparent Order
Apparent Disorder
Read 7 tweets
๐Ÿ’ฅ14 herramientas secretas impulsadas por #RStats para ahorrar tiempo y esfuerzo en tus proyectos de datos (ยกNo te lo pierdas!):๐Ÿ‘€
1๏ธโƒฃ ยกEdita tus datos de forma interactiva (y guarda el cรณdigo)! ๐Ÿ‘€
๐Ÿ“ฆ 'editData' es un complemento de RStudio para editar un data.frame o un tibble de forma interactiva
๐Ÿ”— buff.ly/3U5Tgjy

#DataScience #DataVisualization #dataviz #stats #analytics #RStats #Analytics
2๏ธโƒฃ ยกCrea grรกficos #ggplot de forma interactiva!๐Ÿš€
๐Ÿ“ฆesquisse es otro de mis addins favoritos de #rstudio
โœ… exporta el grรกfico o recupera el cรณdigo para reproducir el grรกfico
๐Ÿ”— buff.ly/3mxLHSo
#DataScience #DataVisualization #dataviz #stats #analytics #RStats #Analytics
Read 17 tweets
Machine Learning is hard.

I felt like quitting whenever I get baffled by the amount of information to learn.

But I followed the path to learning from Math for ML.

Here are the 5 critical reasons why you should start with Math concepts for ML

THREAD๐Ÿงต โ†“
๐Ÿค” Why Math for ML:
**********************
๐Ÿ“Œ Machine learning algorithms rely on mathematical calculations to make predictions and decisions
๐Ÿ“Œ Many machine learning techniques, such as neural networks and deep learning, are based on linear algebra and calculus
Read 12 tweets
Learn SQL projects for Data Analysis and add to your portfolio for free ๐Ÿฅณ

Projects for the portfolio are very important. These projects will improve your skills

A thread ๐Ÿงต๐Ÿ‘‡
1.
2.
Read 9 tweets
๐Ÿงฌ๐Ÿงตhere is a Twitter thread explaining #Bioinformatics in simple terms:๐Ÿงฌ๐Ÿงต
#Bioinformatics is the field that uses computational tools and methods to analyze and interpret #biological #data.
This can include analyzing #DNA and #protein #sequences, predicting the structure and function of #molecules, and modeling biological systems.
Read 7 tweets
Is the #Bitcoin price a stationary times series? ๐Ÿค”

Check ๐Ÿ‘‡ how to find it out!

#DataScience #MachineLearning
We know that stationary means that the mean and variance of the time series data do not vary across time.

To be sure of that we can perform the Augmented Dickey-Fuller test.
If you are familiar with #statistics here you have the hypotheses for this test:

H0๏ธโƒฃ (Null hypothesis) = Time series non-stationary

H1๏ธโƒฃ (Alternative hypothesis) = Time series is stationary

After the test, we will pay attention to the "p-value".
Read 11 tweets
A translation of my first thread for the general public out there. I will talk about how to correctly, yet efficiently model the uncertainty on predictions (for example in machine learning). (1/n)

#statistics #DataScience #machinelearning #conformalprediction
When I started as a PhD I was convinced of two things:
1) Modelling uncertainty is hard, and
2) The only viable approach is the Bayesian one.

This idea is so strongly ingrained in the statistical literature and data science community that it must be true, right? (2/n)
The answer is no and luckily I quickly learned of a great alternative. The idea behind "Conformal prediction" is as simple as possible: You calculate the errors on a holdout dataset and choose, for example, the 90% quantile. (3/n)
Read 7 tweets
Mijn eerste draadje (zoals men dit blijkbaar noemt) gaat over het correct, maar eenvoudig modelleren van de onzekerheid op voorspellingen. (1/n)

#Statistics #DataScience #machinelearning #conformalprediction
Toen ik begon als PhD was ik van twee dingen overtuigd:
1) Onzekerheid op voorspellingen modelleren is niet eenvoudig.
2) Dit kan enkel op een Bayesiaanse manier.

Dit idee is zo sterk verspreid binnen de statistiek en datawetenschappen dat het wel waar moet zijn. Of niet? (2/n)
Gelukkig leerde ik al vrij snel een alternatief kennen. Het idee achter "Conformal prediction" is zo simpel als het maar kan zijn: je berekent de fouten op wat validatiedata en kiest bijvoorbeeld het 90%-kwantiel. (3/n)
Read 7 tweets
Fit a polynomial of degree D to N points by minimising the mean squared error. Now, start increasing D and look at the accuracy. The accuracy blows-up around D~N and starts improving *again* for D>>N ๐Ÿ™‚

#doubleDescent #statistics #maths
"Learning to Generalize" by Manfred Opper
ki.tu-berlin.de/fileadmin/fg13โ€ฆ Image
Read 5 tweets

Related hashtags

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!