🔥 Matt Dancho (Business Science) 🔥 Profile picture
Future Is Generative AI + Data Science | Helping My Students Become Generative AI Data Scientists & AI Engineers ($200,000+ career) 👇
11 subscribers
Dec 29 9 tweets 3 min read
🚨 BREAKING: Microsoft launches a free Python library that converts ANY document to Markdown

Introducing Markitdown. Let me explain. 🧵 Image 1. Document Parsing Pipelines

MarkItDown is a lightweight Python utility for converting various files to Markdown for use with LLMs and related text analysis pipelines. Image
Dec 27 8 tweets 3 min read
🚨 BREAKING: IBM launches a free Python library that converts ANY document to data

Introducing Docling. Here's what you need to know: 🧵 Image 1. What is Docling?

Docling is a Python library that simplifies document processing, parsing diverse formats — including advanced PDF understanding — and providing seamless integrations with the gen AI ecosystem. Image
Dec 20 7 tweets 2 min read
Google just dropped a masterclass on Agents.

Here's what's covered in the 54 page PDF: Image Here's what they cover:

1. From models to agents

2. What an AI Agent is

3. Agentic problem-solving loop (5 steps)

4. Taxonomy of agentic systems (levels 0–4)

5. Core architecture decisions

6. Multi-agent patterns (design patterns)
Dec 17 10 tweets 4 min read
This 277-page PDF unlocks the secrets of Large Language Models.

Here's what's inside: 🧵 Image Chapter 1 introduces the basics of pre-training.

This is the foundation of large language models, and common pre-training methods and model architectures will be discussed here. Image
Dec 16 9 tweets 3 min read
Stanford just made fine-tuning irrelevant with a single paper.

It’s called Agentic Context Engineering (ACE) and it proves you can make models smarter without touching a single weight.

Key takeaways (and get the 23 page PDF): Image Stanford just released a 23 page paper on Agentic Context Enginnering to improve Agents. Key ideas:

1. ACE = Agentic Context Engineering: treat system prompts + agent memory as a living playbook. Image
Dec 8 8 tweets 3 min read
🚨BREAKING: New Python library for agentic data processing and ETL with AI

Introducing DocETL.

Here's what you need to know: Image 1. What is DocETL?

It's a tool for creating and executing data processing pipelines, especially suited for complex document processing tasks.

It offers:

- An interactive UI playground
- A Python package for running production pipelines Image
Dec 8 10 tweets 4 min read
🚨 BREAKING: Microsoft launches a free Python library that converts ANY document to Markdown

Introducing Markitdown. Let me explain. 🧵 Image 1. Document Parsing Pipelines

MarkItDown is a lightweight Python utility for converting various files to Markdown for use with LLMs and related text analysis pipelines. Image
Dec 6 8 tweets 3 min read
Agentic AI: A comprehensive survey of architectures, appications, and future directions (A 37 page PDF)

Here are the best parts: Image 1. The Dual-Paradigm Framework

The survey argues that Agentic AI research must be categorized to avoid conceptual retrofitting (applying old symbolic models to new systems).
Dec 1 7 tweets 2 min read
RIP BI Dashboards.

Tools like Tableau and PowerBI are about to become extinct.

This is what's coming (and how to prepare): Image I've never been a fan of Tableau and PowerBI.

Static dashboards don't answer dynamic business questions.

That's why a new breed of analytics is coming: AI Analytics. Image
Nov 30 8 tweets 3 min read
Stop Prompting LLMs.
Start Programming LLMs.

Introducing DSPy by Stanford NLP.

This is why you need to learn it: Image 1. Why DSPy?

DSPy is the open-source framework for programming—rather than prompting—language models.

It allows you to iterate fast on building modular AI systems.
Nov 29 10 tweets 4 min read
This 277-page PDF unlocks the secrets of Large Language Models.

Here's what's inside: 🧵 Image Chapter 1 introduces the basics of pre-training.

This is the foundation of large language models, and common pre-training methods and model architectures will be discussed here. Image
Nov 28 8 tweets 3 min read
🚨BREAKING: New Python library for agentic data processing and ETL with AI

Introducing DocETL.

Here's what you need to know: Image 1. What is DocETL?

It's a tool for creating and executing data processing pipelines, especially suited for complex document processing tasks.

It offers:

- An interactive UI playground
- A Python package for running production pipelines Image
Nov 27 9 tweets 3 min read
🚨 BREAKING: Microsoft launches a free Python library that converts ANY document to Markdown

Introducing Markitdown. Let me explain. 🧵 Image 1. Document Parsing Pipelines

MarkItDown is a lightweight Python utility for converting various files to Markdown for use with LLMs and related text analysis pipelines. Image
Nov 15 11 tweets 2 min read
The AI Agent Development Process.

How to go from idea to production.

(A thread) 🧵 Image 1. What is the AI Agent Development Process?

A repeatable path to ship an agent from idea to production: define → design → build → train → validate → deploy → monitor → improve.
Nov 11 11 tweets 4 min read
K-means is one of the most powerful algorithms for data scientists.

But it's confusing for beginners. Let's fix that: Image 1. What is K-means?

Is a popular unsupervised machine learning algorithm used for clustering. It's a core algorithm used for customer segmentation, inventory categorization, market segmentation, and even anomaly detection. Image
Nov 1 5 tweets 2 min read
🚨NEW Whitepaper on AI Agents by OpenAI

The maker of ChatGPT shares how it builds AI Agents.

Get the 34-page white paper here: Image This Whitepaper covers:

1. Building, evaluating, and deploying AI agents
2. Architectures, tool integration, and scaling
3. Agent ops and evaluation frameworks

Get it here:

I have one more thing before you go.

If you want to become a generative AI data scientist in 2025 ($200,000 career), then I'd like to help:cdn.openai.com/business-guide…Image
Oct 26 9 tweets 4 min read
This is wild.

A new paper shows how you can predict real purchase intent without asking people.

~90% of human test–retest reliability.

Here's what's inside the 28 page paper: Image 1. Problem with direct Likert from LLMs:

When you ask LLMs to output 1–5 ratings directly, the distributions are too narrow/skewed and don’t look like human survey data, limiting usefulness for concept testing. Image
Oct 22 7 tweets 2 min read
How to build AI agents:

A great cheat sheet (bookmark for later).

Here's how to use it: Image 1️⃣ System Prompt: Define your agent’s role, capabilities, and boundaries. This gives your agent the necessary context.

2️⃣ LLM (Large Language Model): Choose the engine. GPT-5, Claude, Mistral, or an open-source model — pick based on reasoning needs, latency, and cost.
Oct 20 15 tweets 3 min read
Understanding P-Values is essential for improving regression models.

In 2 minutes, I'll crush your confusion. Image 1. The p-value:

A p-value in statistics is a measure used to assess the strength of the evidence against a null hypothesis.
Oct 20 13 tweets 4 min read
Understanding probability is essential in data science.

In 4 minutes, I'll demolish your confusion.

Let's go! Image 1. Statistical Distributions:

There are 100s of distributions to choose from when modeling data. Choices seem endless. Use this as a guide to simplify the choice. Image
Oct 18 15 tweets 5 min read
Top 10 Python Libraries for Generative AI You Need to Master in 2025

(The tools behind document agents, intelligent assistants, and next-gen interfaces.)

Everything you need to know: 🧵 Image 1. LangChain

The backbone of intelligent LLM apps.

Build agents that:
✅ Reason
✅ Use tools
✅ Remember conversations
✅ Access APIs

If you're building anything with GPTs, LangChain is your starting point.

langchain.com