🔥 Matt Dancho (Business Science) 🔥 Profile picture
Future Is Generative AI + Data Science | Helping My Students Become Generative AI Data Scientists & AI Engineers ($200,000+ career) 👇
11 subscribers
Dec 8 10 tweets 4 min read
🚨 BREAKING: Microsoft launches a free Python library that converts ANY document to Markdown

Introducing Markitdown. Let me explain. 🧵 Image 1. Document Parsing Pipelines

MarkItDown is a lightweight Python utility for converting various files to Markdown for use with LLMs and related text analysis pipelines. Image
Dec 6 8 tweets 3 min read
Agentic AI: A comprehensive survey of architectures, appications, and future directions (A 37 page PDF)

Here are the best parts: Image 1. The Dual-Paradigm Framework

The survey argues that Agentic AI research must be categorized to avoid conceptual retrofitting (applying old symbolic models to new systems).
Dec 1 7 tweets 2 min read
RIP BI Dashboards.

Tools like Tableau and PowerBI are about to become extinct.

This is what's coming (and how to prepare): Image I've never been a fan of Tableau and PowerBI.

Static dashboards don't answer dynamic business questions.

That's why a new breed of analytics is coming: AI Analytics. Image
Nov 30 8 tweets 3 min read
Stop Prompting LLMs.
Start Programming LLMs.

Introducing DSPy by Stanford NLP.

This is why you need to learn it: Image 1. Why DSPy?

DSPy is the open-source framework for programming—rather than prompting—language models.

It allows you to iterate fast on building modular AI systems.
Nov 29 10 tweets 4 min read
This 277-page PDF unlocks the secrets of Large Language Models.

Here's what's inside: 🧵 Image Chapter 1 introduces the basics of pre-training.

This is the foundation of large language models, and common pre-training methods and model architectures will be discussed here. Image
Nov 28 8 tweets 3 min read
🚨BREAKING: New Python library for agentic data processing and ETL with AI

Introducing DocETL.

Here's what you need to know: Image 1. What is DocETL?

It's a tool for creating and executing data processing pipelines, especially suited for complex document processing tasks.

It offers:

- An interactive UI playground
- A Python package for running production pipelines Image
Nov 27 9 tweets 3 min read
🚨 BREAKING: Microsoft launches a free Python library that converts ANY document to Markdown

Introducing Markitdown. Let me explain. 🧵 Image 1. Document Parsing Pipelines

MarkItDown is a lightweight Python utility for converting various files to Markdown for use with LLMs and related text analysis pipelines. Image
Nov 15 11 tweets 2 min read
The AI Agent Development Process.

How to go from idea to production.

(A thread) 🧵 Image 1. What is the AI Agent Development Process?

A repeatable path to ship an agent from idea to production: define → design → build → train → validate → deploy → monitor → improve.
Nov 11 11 tweets 4 min read
K-means is one of the most powerful algorithms for data scientists.

But it's confusing for beginners. Let's fix that: Image 1. What is K-means?

Is a popular unsupervised machine learning algorithm used for clustering. It's a core algorithm used for customer segmentation, inventory categorization, market segmentation, and even anomaly detection. Image
Nov 1 5 tweets 2 min read
🚨NEW Whitepaper on AI Agents by OpenAI

The maker of ChatGPT shares how it builds AI Agents.

Get the 34-page white paper here: Image This Whitepaper covers:

1. Building, evaluating, and deploying AI agents
2. Architectures, tool integration, and scaling
3. Agent ops and evaluation frameworks

Get it here:

I have one more thing before you go.

If you want to become a generative AI data scientist in 2025 ($200,000 career), then I'd like to help:cdn.openai.com/business-guide…Image
Oct 26 9 tweets 4 min read
This is wild.

A new paper shows how you can predict real purchase intent without asking people.

~90% of human test–retest reliability.

Here's what's inside the 28 page paper: Image 1. Problem with direct Likert from LLMs:

When you ask LLMs to output 1–5 ratings directly, the distributions are too narrow/skewed and don’t look like human survey data, limiting usefulness for concept testing. Image
Oct 22 7 tweets 2 min read
How to build AI agents:

A great cheat sheet (bookmark for later).

Here's how to use it: Image 1️⃣ System Prompt: Define your agent’s role, capabilities, and boundaries. This gives your agent the necessary context.

2️⃣ LLM (Large Language Model): Choose the engine. GPT-5, Claude, Mistral, or an open-source model — pick based on reasoning needs, latency, and cost.
Oct 20 15 tweets 3 min read
Understanding P-Values is essential for improving regression models.

In 2 minutes, I'll crush your confusion. Image 1. The p-value:

A p-value in statistics is a measure used to assess the strength of the evidence against a null hypothesis.
Oct 20 13 tweets 4 min read
Understanding probability is essential in data science.

In 4 minutes, I'll demolish your confusion.

Let's go! Image 1. Statistical Distributions:

There are 100s of distributions to choose from when modeling data. Choices seem endless. Use this as a guide to simplify the choice. Image
Oct 18 15 tweets 5 min read
Top 10 Python Libraries for Generative AI You Need to Master in 2025

(The tools behind document agents, intelligent assistants, and next-gen interfaces.)

Everything you need to know: 🧵 Image 1. LangChain

The backbone of intelligent LLM apps.

Build agents that:
✅ Reason
✅ Use tools
✅ Remember conversations
✅ Access APIs

If you're building anything with GPTs, LangChain is your starting point.

langchain.com
Oct 17 5 tweets 2 min read
AI Engineering Toolkit

A curated list of 100+ LLM libraries and frameworks for training, fine-tuning, building, evaluating, deploying, RAG, and AI Agents.

100% Open Source Image Get it here:

I have one more thing before you go.

If you want to become a generative AI data scientist in 2025 ($200,000 career), then I'd like to help:github.com/Sumanth077/ai-…
Oct 13 8 tweets 3 min read
🚨BREAKING: New Python library for agentic data processing and ETL with AI

Introducing DocETL.

Here's what you need to know: Image 1. What is DocETL?

It's a tool for creating and executing data processing pipelines, especially suited for complex document processing tasks.

It offers:

- An interactive UI playground
- A Python package for running production pipelines Image
Oct 12 8 tweets 4 min read
Stop doing Customer Segmentation with plain vanilla Scikit Learn.

Add these 7 Python libraries to your RFM, clustering, and
customer segmentation projects: Image 1. Data preparation

- load data with pandas
- impute/mask with Feature-engine

Website: feature-engine.trainindata.com/en/latest/inde…Image
Oct 11 9 tweets 3 min read
🚨NEW: Python library for LLM Prompt Management

This is what it does: Image The Python library is called Promptify.

It combines a prompter, LLMs, and pipeline to Solve NLP Problems with LLM's.

You can easily generate different NLP Task prompts for popular generative models like GPT, PaLM, and more with Promptify. Image
Oct 11 10 tweets 3 min read
🚨 BREAKING: Microsoft launches a free Python library that converts ANY document to Markdown

Introducing Markitdown. Let me explain. 🧵 Image 1. Document Parsing Pipelines

MarkItDown is a lightweight Python utility for converting various files to Markdown for use with LLMs and related text analysis pipelines. Image
Oct 8 12 tweets 4 min read
These 7 statistical analysis concepts have helped me as an AI Data Scientist.

Let's go: 🧵 Image Step 1: Learn These Descriptive Statistics

Mean, median, mode, variance, standard deviation. Used to summarize data and spot variability. These are key for any data scientist to understand what’s in front of them in their data sets. Image