Matt Dancho (Business Science) Profile picture
May 10, 2023 8 tweets 7 min read Read on X
Learning data science on your own is tough...

...(ahem, it took me 6 years)

So here's some help.

5 Free Books to Cut Your Time In HALF.

Let's go! 🧵

#datascience #rstats #R Image
1. Mastering #Spark with #R

This book solves an important problem- what happens when your data gets too big?

For example, analyzing 100,000,000 time series.

You can do it in R with the tools covered in this book.

Website: therinspark.com Image
2. Geocomputation with #R

Interested in #Geospatial Analysis?

This book is my go-to resource for all things geospatial.

This book covers:
-Making Maps
-Working with Spatial Data
-Applications (Transportation, Geomarketing)

Website: r.geocompx.org Image
3. Tidy Finance with #R

What tools exist in R for #Finance?
And how do I use them?

Answers to these questions are covered in this book!

P.S.- This book uses my R package, #tidyquant

Website: tidy-finance.org Image
4. Text Mining with R

This is a fantastic introduction to text analysis and text mining with the #tidytext R package.

This book singlehandedly made me MORE CONFIDENT with text analysis.

Website: tidytextmining.com Image
5. #Forecasting Principles and Practice

This is the best “theory” book on #timeseries analysis and forecasting.

Topics Covered:
- ARIMA,
- Exponential Smoothing,
- TimeSeries Decomposition
- A lot more!

Website: otexts.com/fpp3/ Image
1-Dollar Bonus Book:

This is a massive value- Gives you a complete plan for EVERYTHING you need to know about learning data science.

It's only a buck.

And it will cut 2-3 years off your journey.

Website: learn.business-science.io/if-i-had-to-le… Image
Want even more help becoming a 6-figure data scientist?

I have a free workshop that will help you become a $100K+ earner as a #DataScientist even in a Recession.

👉Register Here: us02web.zoom.us/webinar/regist… Image

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Matt Dancho (Business Science)

Matt Dancho (Business Science) Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @mdancho84

Apr 15
These 7 statistical analysis concepts have helped me as an AI Data Scientist.

Let's go: 🧵 Image
Step 1: Learn These Descriptive Statistics

Mean, median, mode, variance, standard deviation. Used to summarize data and spot variability. These are key for any data scientist to understand what’s in front of them in their data sets. Image
2. Learn Probability

Know your distributions (Normal, Binomial) & Bayes’ Theorem. The backbone of modeling and reasoning under uncertainty. Central Limit Theorem is a must too. Image
Read 9 tweets
Apr 7
🚨 BREAKING: Microsoft launches a free Python library that converts ANY document to Markdown

Introducing Markitdown. Let me explain. 🧵 Image
1. Document Parsing Pipelines

MarkItDown is a lightweight Python utility for converting various files to Markdown for use with LLMs and related text analysis pipelines. Image
2. Supported Documents

MarkItDown supports:

- PDF
- PowerPoint
- Word
- Excel
- Images (EXIF metadata and OCR)
- Audio (EXIF metadata and speech transcription)
- HTML
- Text-based formats (CSV, JSON, XML)
- ZIP files (iterates over contents)
- Youtube URLs
- EPubs Image
Read 7 tweets
Apr 2
RIP document extractors.

Google just released LangExtract: Open-source. Free. Better than $100K enterprise tools.

Here’s what it does: 🧵 Image
What it does:

→ Extracts structured data from messy text
→ Grounds every field to the exact source location
→ Handles 100+ page docs
→ Generates interactive HTML for verification
→ Works with Gemini + local models Image
What it replaces:

→ Regex/fragile parsing
→ Custom NER pipelines
→ Expensive extraction APIs
→ Manual data entry Image
Read 8 tweets
Mar 31
Data science killed itself.

Not because AI showed up. Because too much of the field confused running a model with understanding one. Image
For years, data science rewarded people for producing outputs:

A model score
A dashboard
A notebook
A prediction
A nice chart

And a lot of that work looked impressive.
But underneath it, there was a problem:

No understanding of the business value (or lack of) it generated.
Read 7 tweets
Mar 22
Someone built a free 7-week RAG curriculum on GitHub.

And they're right — it's good.

But, you'll need 1 more thing to get an AI/DS job in 2026: Image
Docker. FastAPI. PostgreSQL. OpenSearch. Airflow. Hybrid search. LangGraph. Production monitoring.

That's a serious architecture. Bookmark it. github.com/jamwithai/prod…
But here's what I've watched happen with 7,500 students over 8 years:

The ones who followed curricula stayed in tutorial purgatory.

The ones who built one real system — in front of a live instructor, with a deadline, with someone watching — shipped.
Read 8 tweets
Mar 17
OpenAI, Google, and Anthropic just published guides on:

• Prompt engineering
• Building agents
• AI in business
• 601 AI use cases

9 of the best guides you can't miss: Image
1. AI in the Enterprise by OpenAI

Grab the PDF: cdn.openai.com/business-guide…Image
2. A practical guide to building agents by OpenAI

Download here: cdn.openai.com/business-guide…
Read 13 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(