🔥 Matt Dancho (Business Science) 🔥 Profile picture
Future Is Generative AI + Data Science | Helping My Students Become Generative AI Data Scientists & AI Engineers ($200,000+ career) 👇
10 subscribers
Sep 16 • 8 tweets • 2 min read
Tableau is about to die.

Introducing PandasAI, a free alternative for fast Business Intelligence.

Let dive in: Image 1. PandasAI

PandaAI transforms your natural language questions into actionable insights — fast, smartly, and effortlessly.
Sep 15 • 11 tweets • 4 min read
RIP Tableau and PowerBI.

Enter Julius AI.

This is what Julius can do: Image 1. The $10 Billion problem with Tableau and PowerBI?

Dashboards are static.

But businesses are dynamic.

That's why I'm so excited about this new tool: Julius AI Image
Sep 14 • 11 tweets • 3 min read
R-squared is one of the most commonly used metrics to measure performance.

But it took me 2 years to figure out the mistakes that were killing my regression models.

In 2 minutes, I'll share how I fixed 2 years of mistakes (and made 50% more accurate models than my peers). Let's go:Image 1. R-squared (R2):

R2 is a statistical measure used in regression models that provides a measure of how well the observed outcomes are replicated by the model, based on the proportion of total variation of outcomes explained by the model.
Sep 13 • 13 tweets • 4 min read
Understanding probability is essential in data science.

In 4 minutes, I'll demolish your confusion.

Let's go! Image 1. Statistical Distributions:

There are 100s of distributions to choose from when modeling data. Choices seem endless. Use this as a guide to simplify the choice. Image
Sep 13 • 10 tweets • 4 min read
🚨 BREAKING: Microsoft launches a free Python library that converts ANY document to Markdown

Introducing Markitdown. Let me explain. đź§µ Image 1. Document Parsing Pipelines

MarkItDown is a lightweight Python utility for converting various files to Markdown for use with LLMs and related text analysis pipelines. Image
Sep 8 • 6 tweets • 2 min read
RIP Data Scientists.

The Generative AI Data Scientist is NOW what companies want.

This is actually good news. Let me explain: Image Companies are sitting on mountains of unstructured data.

PDF
Word docs
Meeting notes
Emails
Videos
Audio Transcripts

This is useful data. But it's unusable in its existing form. Image
Sep 4 • 13 tweets • 4 min read
K-means is an essential algorithm for Data Science.

But it's confusing for beginners.

Let me demolish your confusion: Image 1. K-Means

K-means is a popular unsupervised machine learning algorithm used for clustering. It's a core algorithm used for customer segmentation, inventory categorization, market segmentation, and even anomaly detection. Image
Sep 3 • 11 tweets • 4 min read
These 7 statistical analysis concepts have helped me as an AI Data Scientist.

Let's go: đź§µ Image Step 1: Learn These Descriptive Statistics

Mean, median, mode, variance, standard deviation. Used to summarize data and spot variability. These are key for any data scientist to understand what’s in front of them in their data sets. Image
Sep 3 • 11 tweets • 3 min read
The 3 types of machine learning (that every data scientist should know).

In 3 minutes I'll eviscerate your confusion. Let's go: đź§µ Image 1. The 3 Fundamental Types of Machine Learning:

- Supervised Learning
- Unsupervised Learning
- Reinforcement Learning.

Let's break them down:
Aug 31 • 8 tweets • 3 min read
🚨 Google published a 69-page prompt engineering masterclass.

This is what's inside: Image Table of Contents:

- Prompt Engineering
- LLM Output Configuration
- Prompting Techniques
- Best Practices Image
Aug 30 • 12 tweets • 4 min read
Linear Regression is one of the most important tools in a Data Scientist's toolbox.

Yet it's super confusing for beginners.

Let's fix that: đź§µ Image 1. Ordinary Least Squares (OLS) Regression

Most common form of Linear Regression. OLS regression aims to find the best-fitting linear equation that describes the relationship between the dependent variable (often denoted as Y) and independent variables (denoted as X1, X2, ..., Xn).Image
Aug 28 • 7 tweets • 3 min read
Came across this new library for LLM Prompt Management in Python.

This is what it does: Image The Python library is called Promptify.

It combines a prompter, LLMs, and pipeline to Solve NLP Problems with LLM's.

You can easily generate different NLP Task prompts for popular generative models like GPT, PaLM, and more with Promptify. Image
Aug 28 • 8 tweets • 3 min read
🚨BREAKING: New Python library for agentic data processing and ETL with AI

Introducing DocETL.

Here's what you need to know: Image 1. What is DocETL?

It's a tool for creating and executing data processing pipelines, especially suited for complex document processing tasks.

It offers:

- An interactive UI playground
- A Python package for running production pipelines Image
Aug 27 • 15 tweets • 3 min read
Understanding P-Values is essential for improving regression models.

In 2 minutes, I'll crush your confusion. Image 1. The p-value:

A p-value in statistics is a measure used to assess the strength of the evidence against a null hypothesis.
Aug 27 • 9 tweets • 3 min read
Logistic Regression is the most important foundational algorithm in Classification Modeling.

In 2 minutes, I'll crush your confusion.

Let's dive in: Image 1. Logistic regression is a statistical method used for analyzing a dataset in which there are one or more independent variables that determine a binary outcome (in which there are only two possible outcomes). This is commonly called a binary classification problem.
Aug 25 • 8 tweets • 3 min read
This guy built an entire AI Data Science Team in Python.

100% Open Source

This is how to get it (for FREE) đź§µ Image 1. What is it?

An AI-powered data science team of agents to help you perform common data science tasks 10X faster.
Aug 24 • 6 tweets • 3 min read
Data scientists are OUT.

The Generative AI Data Scientist is IN.

This is why (and how you can make the transition): đź§µ Image Companies are sitting on mountains of unstructured data.

PDF
Word docs
Meeting notes
Emails
Videos
Audio Transcripts

This is useful data. But it's unusable in its existing form. Image
Aug 22 • 10 tweets • 4 min read
This 277-page PDF unlocks the secrets of Large Language Models.

Here's what's inside: đź§µ Image Chapter 1 introduces the basics of pre-training.

This is the foundation of large language models, and common pre-training methods and model architectures will be discussed here. Image
Aug 20 • 7 tweets • 3 min read
Stop Prompting LLMs.
Start Programming LLMs.

Introducing DSPy by Stanford NLP.

This is why you need to learn it: Image 1. Why DSPy?

DSPy is the open-source framework for programming—rather than prompting—language models.

It allows you to iterate fast on building modular AI systems.
Aug 19 • 6 tweets • 3 min read
Is data cleaning time-consuming?

This is how I went from 3 hours to 5 seconds: Image Data cleaning is one of those parts of the data science process that can take 3+ hours.

So in December, I decided to make an AI agent that cleans data for me.

This is what I made: Image
Aug 19 • 5 tweets • 2 min read
STOP DOING CUSTOMER SEGMENTATION WITH MACHINE LEARNING.

Start using AI.

This is how: Image ML is great for 1 thing: finding clusters.

That's only 33% of the problem.

The other 66% is identifying what those clusters mean (and figuring out how to market to them). Image