🔥 Matt Dancho (Business Science) 🔥 Profile picture
Future Is Generative AI + Data Science | Helping My Students Become Generative AI Data Scientists ($200,000 /year career) 👇
9 subscribers
Apr 13 • 9 tweets • 4 min read
🚨 Google published a 69-page prompt engineering masterclass.

This is what's inside: Image Table of Contents:

- Prompt Engineering
- LLM Output Configuration
- Prompting Techniques
- Best Practices Image
Apr 13 • 7 tweets • 3 min read
❌Move over PowerBI. There's a new AI analyst in town.

đź’ˇIntroducing ThoughtSpot. Image 1. AI Analyst

ThoughtSpot’s Spotter is an AI analyst that uses generative AI to answer complex business questions in natural language, delivering visualizations and insights instantly.

It supports iterative querying (e.g., “What’s next?”) without predefined dashboards. Image
Apr 12 • 8 tweets • 3 min read
RIP Tableau.

Introducing PandasAI, a free alternative for fast Business Intelligence.

Let dive in: 🧵 Image 1. PandasAI

PandaAI transforms your natural language questions into actionable insights — fast, smartly, and effortlessly. Image
Apr 11 • 12 tweets • 3 min read
Understanding probability is essential in data science.

In 4 minutes, I'll demolish your confusion.

Let's go! Image 1. Statistical Distributions:

There are 100s of distributions to choose from when modeling data. Choices seem endless. Use this as a guide to simplify the choice. Image
Apr 10 • 9 tweets • 3 min read
🚨 BREAKING: Google just open sourced Agent Development Kit (ADK) in Python

This is what you need to know: 🧵 Image 1. What is Google ADK?

Agent Development Kit (ADK) is a flexible and modular framework for developing and deploying AI agents.

ADK can be used with popular LLMs and open-source generative AI tools and is designed with a focus on tight integration with the Google ecosystem and Gemini models.
Apr 8 • 8 tweets • 3 min read
Stop Prompting LLMs.
Start Programming LLMs.

Introducing DSPy by Stanford NLP.

This is why you need to learn it: Image 1. Why DSPy?

DSPy is the open-source framework for programming—rather than prompting—language models.

It allows you to iterate fast on building modular AI systems.
Apr 8 • 11 tweets • 4 min read
RIP Tableau and PowerBI.

Enter Julius AI.

This is what Julius can do: Image 1. The $10 Billion problem with Tableau and PowerBI?

Dashboards are static.

But businesses are dynamic.

That's why I'm so excited about this new tool: Julius AI Image
Apr 7 • 11 tweets • 4 min read
Top 7 most important statistical analysis concepts that have helped me as a Data Scientist.

This is a complete 7-step beginner ROADMAP for learning stats for data science. Let's go: Image Step 1: Learn These Descriptive Statistics

Mean, median, mode, variance, standard deviation. Used to summarize data and spot variability. These are key for any data scientist to understand what’s in front of them in their data sets. Image
Apr 6 • 8 tweets • 3 min read
🚨 BREAKING: IBM launches a free Python library that converts ANY document to data

Introducing Docling. Here's what you need to know: 🧵 Image 1. What is Docling?

Docling is a Python library that simplifies document processing, parsing diverse formats — including advanced PDF understanding — and providing seamless integrations with the gen AI ecosystem. Image
Apr 5 • 9 tweets • 3 min read
🚨 BREAKING: Microsoft launches a free Python library that converts ANY document to Markdown

Introducing Markitdown. Let me explain. 🧵 Image 1. Document Parsing Pipelines

MarkItDown is a lightweight Python utility for converting various files to Markdown for use with LLMs and related text analysis pipelines. Image
Apr 2 • 9 tweets • 3 min read
Python has powerful time series libraries.

Case in point: skforecast

Let me explain: Image Skforecast is a Python library for time series forecasting using machine learning models.

Skforecast works with any regressor compatible with the scikit-learn API, including popular options like LightGBM, XGBoost, CatBoost, Keras, and many others. Image
Mar 25 • 7 tweets • 3 min read
Becoming an AI data scientist has levels to it.

Here's a complete roadmap: Image Becoming an AI data scientist is a journey with multiple levels, each requiring specific tools and skills.

I’ve outlined a progression of levels with relevant skills and tools:
Mar 24 • 13 tweets • 4 min read
Understanding P-Values is essential for improving regression models.

In 2 minutes, I'll crush your confusion.

Let's go: Image 1. The p-value:

A p-value in statistics is a measure used to assess the strength of the evidence against a null hypothesis. Image
Mar 23 • 5 tweets • 2 min read
Data scientists are out.

The Generative AI Data Scientist is in.

Let me explain: Image Companies are sitting on mountains of unstructured data.

PDF
Word docs
Meeting notes
Emails
Videos
Audio Transcripts

This is useful data. But it's unusable in its existing form. Image
Mar 23 • 11 tweets • 4 min read
Correlation is the skill that has singlehandedly benefitted me the most in my career.

In 3 minutes I'll demolish your confusion (and share strengths and weaknesses you might be missing).

Let's go: Image 1. Correlation:

Correlation is a statistical measure that describes the extent to which two variables change together. It can indicate whether and how strongly pairs of variables are related. Image
Mar 22 • 7 tweets • 2 min read
Data Science for Business.

The book that helped me connect the dots. Let's dive in: Image 1. CRISP Data Mining Process

The foundation for applying data science to business is the CRISP method.

This is a helpful framework for integrating data science with the business understanding. Image
Mar 21 • 10 tweets • 4 min read
90% of data scientists can improve their SQL for business intelligence.

In 3 minutes, learn the 20% of SQL gets 80% of results: Image 🔍 SELECT Basics:

Start with SELECT * FROM table to retrieve all rows & columns.

Remember, SQL isn’t case-sensitive—but capitalizing keywords (SELECT, FROM) makes your queries easier to read. Image
Mar 20 • 12 tweets • 4 min read
Understanding probability is essential in data science.

In 4 minutes, I'll demolish your confusion.

Let's go! Image 1. Statistical Distributions:

There are 100s of distributions to choose from when modeling data. Choices seem endless. Use this as a guide to simplify the choice. Image
Mar 17 • 11 tweets • 4 min read
6 statistical methods that can be used for A/B Testing (and when to use them). Image A/B Testing is a staple of data science and data analyst interviews.

And it's the Number 1 technique that companies benefit from in improving customer revenue.

So here are 6 of the most common stat methods used in A/B testing.
Mar 16 • 11 tweets • 4 min read
R-squared is one of the most commonly used metrics to measure performance.

But it took me 2 years to figure out mistakes that were killing my regression models.

In 2 minutes, I'll share how I fixed 2 years of mistakes (and made 50% more accurate models than my peers). Let's go:Image 1. R-squared (R2):

R2 is a statistical measure used in regression models that provides a measure of how well the observed outcomes are replicated by the model, based on the proportion of total variation of outcomes explained by the model. Image
Mar 15 • 10 tweets • 4 min read
Logistic Regression is the most important foundational algorithm in Classification Modeling.

In 2 minutes, I'll teach you what took me 2 months to learn.

Let's go: 🧵 Image 1. Logistic regression:

Is a statistical method used for analyzing a dataset in which there are one or more independent variables that determine a binary outcome (in which there are only two possible outcomes). This is commonly called a binary classification problem. Image