πŸ”₯ Matt Dancho (Business Science) πŸ”₯ Profile picture
May 10, 2023 β€’ 8 tweets β€’ 7 min read β€’ Read on X
Learning data science on your own is tough...

...(ahem, it took me 6 years)

So here's some help.

5 Free Books to Cut Your Time In HALF.

Let's go! 🧡

#datascience #rstats #R Image
1. Mastering #Spark with #R

This book solves an important problem- what happens when your data gets too big?

For example, analyzing 100,000,000 time series.

You can do it in R with the tools covered in this book.

Website: therinspark.com Image
2. Geocomputation with #R

Interested in #Geospatial Analysis?

This book is my go-to resource for all things geospatial.

This book covers:
-Making Maps
-Working with Spatial Data
-Applications (Transportation, Geomarketing)

Website: r.geocompx.org Image
3. Tidy Finance with #R

What tools exist in R for #Finance?
And how do I use them?

Answers to these questions are covered in this book!

P.S.- This book uses my R package, #tidyquant

Website: tidy-finance.org Image
4. Text Mining with R

This is a fantastic introduction to text analysis and text mining with the #tidytext R package.

This book singlehandedly made me MORE CONFIDENT with text analysis.

Website: tidytextmining.com Image
5. #Forecasting Principles and Practice

This is the best β€œtheory” book on #timeseries analysis and forecasting.

Topics Covered:
- ARIMA,
- Exponential Smoothing,
- TimeSeries Decomposition
- A lot more!

Website: otexts.com/fpp3/ Image
1-Dollar Bonus Book:

This is a massive value- Gives you a complete plan for EVERYTHING you need to know about learning data science.

It's only a buck.

And it will cut 2-3 years off your journey.

Website: learn.business-science.io/if-i-had-to-le… Image
Want even more help becoming a 6-figure data scientist?

I have a free workshop that will help you become a $100K+ earner as a #DataScientist even in a Recession.

πŸ‘‰Register Here: us02web.zoom.us/webinar/regist… Image

β€’ β€’ β€’

Missing some Tweet in this thread? You can try to force a refresh
γ€€

Keep Current with πŸ”₯ Matt Dancho (Business Science) πŸ”₯

πŸ”₯ Matt Dancho (Business Science) πŸ”₯ Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @mdancho84

May 10
Bayesian data analysis is a fundamental concept in data science. But it took me 2 years to understand its importance.

In 2 minutes, I'll share my best findings over the last 2 years exploring Bayesian Modeling.

Let's go. 🧡 Image
1. Why Bayesian Data Analysis?

Bayesian modeling is a powerful tool in statistics and data science, especially where traditional approaches fall short.

It avoids arbitrary assumptions and provides distributions of possible values instead of just point estimates.
2. Bayes Theorem:

Bayesian modeling is based on Bayes’ theorem.

Bayes' Theorem provides a mathematical formula to update the probability for a hypothesis as more evidence or information becomes available.

It describes how to revise existing predictions or theories in light of new evidence, a process known as Bayesian inference.
Read 10 tweets
May 8
Why data scientists should stop ignoring AI.

A thread🧡 Image
I get it. Yet another "hypecycle".

In 2016 it was Deep Learning.

Now it's Generative AI. Right?

Wrong. This is why.
1. GenerativeAI is a 10X complement to Data Science

In the past, deep learning had limited uses in Business Intelligence, Data Analytics, and in particular within Data Science for Business contexts like working with Tabular data.

Generative AI is the opposite. Instead of trying to improve on Machine Learning, generative AI adds a superpower of automation.
Read 8 tweets
May 6
The concept that helped me go from bad models to good models: Bias and Variance. In 4 minutes, I'll share 4 years of experience in managing bias and variance in my machine learning models.

Let's go. 🧡 Image
1. Generalization:

Bias and variance control your models ability to generalize on new, unseen data, not just the data it was trained on. The goal in machine learning is to build models that generalize well. To do so, I manage bias and variance.
2. Low vs High Bias:

Models with low bias are usually complex and can capture the underlying patterns in data very well.

Models with high bias are overly simple and cannot capture the complexity in the data. They often underfit the training data.
Read 11 tweets
May 4
Principal Component Analysis (PCA) is the gold standard in dimensionality reduction with uses in business. In 5 minutes, I'll teach you what took me 5 weeks. Let's go! 🧡 Image
1. What is PCA?:

PCA is a statistical technique used in data analysis, mainly for dimensionality reduction.

It's beneficial when dealing with large datasets with many variables, and it helps simplify the data's complexity while retaining as much variability as possible.
2. How PCA Works:

PCA has 5 steps:

1. Standardization
2. Covariance Matrix Computation
3. Eigen Vector Calculation
4. Choosing Principal Components
5. Transforming the data.

Let's break them down.
Read 11 tweets
Nov 30, 2023
90% of data scientists overlook how to design A/B Testing experiments.

4 tips for better experiments: 🧡

#DataScience #ABTesting Image
Tip 1: Include a pre-test

Pretest data is unaffected data before the actual A/B test or Time-based Experiment.

Pre-test is a secret used by Booking(dot)com in their CUPED A/B Test method for reducing variance (and improving decision-making from A/B Test results).
Tip 2: Factor in time to effect

For online conversions, sales effects can take time. Your experiment should factor this impact.

A different technique, called Causal Impact can be more important especially if the conversion is a longer sale-cycle / process.
Read 6 tweets
Nov 27, 2023
Both Bayesian and Frequentist approaches to A/B testing have strengths (and weaknesses).

Here's a quick selection guide with 4 Pros/Cons. 🧡

#Bayesian #Frequentist #MachineLearning #ABTesting Image
πŸ’‘ 4 Reasons for the #Frequentist Approach for A/B testing

1. Fixed Sample Size: Requires pre-determination of sample size. Ideal when sample size cannot change once the test begins.
2. P-values and Confidence Intervals: Provides p-values to infer statistical significance and confidence intervals for parameter estimates.

3. Simplicity: Generally easier to explain and understand for those without a strong statistical background.
Read 8 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(