🔥 Matt Dancho (Business Science) 🔥's Threads

Nov 15 • 11 tweets • 2 min read

The AI Agent Development Process.

How to go from idea to production.

(A thread) 🧵

1. What is the AI Agent Development Process?

A repeatable path to ship an agent from idea to production: define → design → build → train → validate → deploy → monitor → improve.

Nov 11 • 11 tweets • 4 min read

K-means is one of the most powerful algorithms for data scientists.

But it's confusing for beginners. Let's fix that:

1. What is K-means?

Is a popular unsupervised machine learning algorithm used for clustering. It's a core algorithm used for customer segmentation, inventory categorization, market segmentation, and even anomaly detection.

Nov 1 • 5 tweets • 2 min read

🚨NEW Whitepaper on AI Agents by OpenAI

The maker of ChatGPT shares how it builds AI Agents.

Get the 34-page white paper here:

This Whitepaper covers:

1. Building, evaluating, and deploying AI agents
2. Architectures, tool integration, and scaling
3. Agent ops and evaluation frameworks

Get it here:

I have one more thing before you go.

If you want to become a generative AI data scientist in 2025 ($200,000 career), then I'd like to help:cdn.openai.com/business-guide…

Oct 26 • 9 tweets • 4 min read

This is wild.

A new paper shows how you can predict real purchase intent without asking people.

~90% of human test–retest reliability.

Here's what's inside the 28 page paper:

1. Problem with direct Likert from LLMs:

When you ask LLMs to output 1–5 ratings directly, the distributions are too narrow/skewed and don’t look like human survey data, limiting usefulness for concept testing.

Oct 22 • 7 tweets • 2 min read

How to build AI agents:

A great cheat sheet (bookmark for later).

Here's how to use it:

1️⃣ System Prompt: Define your agent’s role, capabilities, and boundaries. This gives your agent the necessary context.

2️⃣ LLM (Large Language Model): Choose the engine. GPT-5, Claude, Mistral, or an open-source model — pick based on reasoning needs, latency, and cost.

Oct 20 • 15 tweets • 3 min read

Understanding P-Values is essential for improving regression models.

In 2 minutes, I'll crush your confusion.

1. The p-value:

A p-value in statistics is a measure used to assess the strength of the evidence against a null hypothesis.

Oct 20 • 13 tweets • 4 min read

Understanding probability is essential in data science.

In 4 minutes, I'll demolish your confusion.

Let's go!

1. Statistical Distributions:

There are 100s of distributions to choose from when modeling data. Choices seem endless. Use this as a guide to simplify the choice.

Oct 18 • 15 tweets • 5 min read

Top 10 Python Libraries for Generative AI You Need to Master in 2025

(The tools behind document agents, intelligent assistants, and next-gen interfaces.)

Everything you need to know: 🧵

1. LangChain

The backbone of intelligent LLM apps.

Build agents that:
✅ Reason
✅ Use tools
✅ Remember conversations
✅ Access APIs

If you're building anything with GPTs, LangChain is your starting point.

langchain.com

Oct 17 • 5 tweets • 2 min read

AI Engineering Toolkit

A curated list of 100+ LLM libraries and frameworks for training, fine-tuning, building, evaluating, deploying, RAG, and AI Agents.

100% Open Source

Get it here:

I have one more thing before you go.

If you want to become a generative AI data scientist in 2025 ($200,000 career), then I'd like to help:github.com/Sumanth077/ai-…

Oct 13 • 8 tweets • 3 min read

🚨BREAKING: New Python library for agentic data processing and ETL with AI

Introducing DocETL.

Here's what you need to know:

1. What is DocETL?

It's a tool for creating and executing data processing pipelines, especially suited for complex document processing tasks.

It offers:

- An interactive UI playground
- A Python package for running production pipelines

Oct 12 • 8 tweets • 4 min read

Stop doing Customer Segmentation with plain vanilla Scikit Learn.

Add these 7 Python libraries to your RFM, clustering, and
customer segmentation projects:

1. Data preparation

- load data with pandas
- impute/mask with Feature-engine

Website: feature-engine.trainindata.com/en/latest/inde…

Oct 11 • 9 tweets • 3 min read

🚨NEW: Python library for LLM Prompt Management

This is what it does:

The Python library is called Promptify.

It combines a prompter, LLMs, and pipeline to Solve NLP Problems with LLM's.

You can easily generate different NLP Task prompts for popular generative models like GPT, PaLM, and more with Promptify.

Oct 11 • 10 tweets • 3 min read

🚨 BREAKING: Microsoft launches a free Python library that converts ANY document to Markdown

Introducing Markitdown. Let me explain. 🧵

1. Document Parsing Pipelines

MarkItDown is a lightweight Python utility for converting various files to Markdown for use with LLMs and related text analysis pipelines.

Oct 8 • 12 tweets • 4 min read

These 7 statistical analysis concepts have helped me as an AI Data Scientist.

Let's go: 🧵

Step 1: Learn These Descriptive Statistics

Mean, median, mode, variance, standard deviation. Used to summarize data and spot variability. These are key for any data scientist to understand what’s in front of them in their data sets.

Oct 6 • 7 tweets • 3 min read

🚨Introducing Agent Development Kit (ADK) by Google

A simple framework from Google for building, evaluating, and deploying AI agents.

Here's what you need to know (a thread): 🧵

1. What is ADK?

Agent Development Kit (ADK) is a framework from Google for building, evaluating, and deploying AI agents.

It is “model-agnostic” and “deployment-agnostic”: although it’s optimized to work well with Google’s models and infrastructure (like Gemini, Vertex AI), you can use it with other models and deploy it in different environments.

Oct 5 • 10 tweets • 4 min read

🚨Introducing Nanobot

Build MCP AI Agents with reasoning, system prompts, and tool orchestration.

This is what it does (and how to get started): (thread)

1. Nanobot enables building agents with MCP and MCP-UI by providing a flexible MCP host.

Nanobot is designed to be a standalone, open-source MCP host that can be easily deployed or integrated into your applications. You can use Nanobot to create your own dedicated MCP and MCP-UI powered chatbot.

Oct 4 • 9 tweets • 3 min read

🚨 Introducing Microsoft Agent Framework.

A new open-source Python library for making agents.

This is what you need to know:

1. The Microsoft Agent Framework is an open-source development kit for building AI agents and multi-agent workflows for Python.

It brings together and extends ideas from the Semantic Kernel and AutoGen projects, combining their strengths while adding new capabilities.

Oct 3 • 9 tweets • 3 min read

Google just dropped a new Generative AI Python library for SQL Databases.

Introducing Google GenAI Toolbox.

This is what you need to know:

1. Meet the Google GenAI Toolbox

An open-source server designed to simplify building Gen AI tools for your databases. It streamlines development, letting you integrate powerful data tools with just a few lines of code.

Oct 2 • 9 tweets • 4 min read

Stop doing Customer Segmentation with plain vanilla Scikit Learn.

Add these 7 Python libraries to your RFM, clustering, and
customer segmentation projects:

1. Data preparation

- load data with pandas
- impute/mask with Feature-engine

Website: feature-engine.trainindata.com/en/latest/inde…

Sep 29 • 10 tweets • 3 min read

🚨 BREAKING: Microsoft launches a free Python library that converts ANY document to Markdown

Introducing Markitdown. Let me explain. 🧵

1. Document Parsing Pipelines

MarkItDown is a lightweight Python utility for converting various files to Markdown for use with LLMs and related text analysis pipelines.

Sep 28 • 9 tweets • 3 min read

Tableau is about to die.

Introducing PandasAI, a free alternative for fast Business Intelligence.

Let's dive in:

1. PandasAI

PandaAI transforms your natural language questions into actionable insights — fast, smartly, and effortlessly.

Share this page!

Enter URL or ID to Unroll