🔥 Matt Dancho (Business Science) 🔥 Profile picture
Future Is Generative AI + Data Science | Helping My Students Become Generative AI Data Scientists & AI Engineers ($200,000+ career) 👇
11 subscribers
Oct 13 • 8 tweets • 3 min read
🚨BREAKING: New Python library for agentic data processing and ETL with AI

Introducing DocETL.

Here's what you need to know: Image 1. What is DocETL?

It's a tool for creating and executing data processing pipelines, especially suited for complex document processing tasks.

It offers:

- An interactive UI playground
- A Python package for running production pipelines Image
Oct 12 • 8 tweets • 4 min read
Stop doing Customer Segmentation with plain vanilla Scikit Learn.

Add these 7 Python libraries to your RFM, clustering, and
customer segmentation projects: Image 1. Data preparation

- load data with pandas
- impute/mask with Feature-engine

Website: feature-engine.trainindata.com/en/latest/inde…Image
Oct 11 • 9 tweets • 3 min read
🚨NEW: Python library for LLM Prompt Management

This is what it does: Image The Python library is called Promptify.

It combines a prompter, LLMs, and pipeline to Solve NLP Problems with LLM's.

You can easily generate different NLP Task prompts for popular generative models like GPT, PaLM, and more with Promptify. Image
Oct 11 • 10 tweets • 3 min read
🚨 BREAKING: Microsoft launches a free Python library that converts ANY document to Markdown

Introducing Markitdown. Let me explain. đź§µ Image 1. Document Parsing Pipelines

MarkItDown is a lightweight Python utility for converting various files to Markdown for use with LLMs and related text analysis pipelines. Image
Oct 8 • 12 tweets • 4 min read
These 7 statistical analysis concepts have helped me as an AI Data Scientist.

Let's go: đź§µ Image Step 1: Learn These Descriptive Statistics

Mean, median, mode, variance, standard deviation. Used to summarize data and spot variability. These are key for any data scientist to understand what’s in front of them in their data sets. Image
Oct 6 • 7 tweets • 3 min read
🚨Introducing Agent Development Kit (ADK) by Google

A simple framework from Google for building, evaluating, and deploying AI agents.

Here's what you need to know (a thread): đź§µ Image 1. What is ADK?

Agent Development Kit (ADK) is a framework from Google for building, evaluating, and deploying AI agents.

It is “model-agnostic” and “deployment-agnostic”: although it’s optimized to work well with Google’s models and infrastructure (like Gemini, Vertex AI), you can use it with other models and deploy it in different environments.Image
Oct 5 • 10 tweets • 4 min read
🚨Introducing Nanobot

Build MCP AI Agents with reasoning, system prompts, and tool orchestration.

This is what it does (and how to get started): (thread) Image 1. Nanobot enables building agents with MCP and MCP-UI by providing a flexible MCP host.

Nanobot is designed to be a standalone, open-source MCP host that can be easily deployed or integrated into your applications. You can use Nanobot to create your own dedicated MCP and MCP-UI powered chatbot.
Oct 4 • 9 tweets • 3 min read
🚨 Introducing Microsoft Agent Framework.

A new open-source Python library for making agents.

This is what you need to know: Image 1. The Microsoft Agent Framework is an open-source development kit for building AI agents and multi-agent workflows for Python.

It brings together and extends ideas from the Semantic Kernel and AutoGen projects, combining their strengths while adding new capabilities. Image
Oct 3 • 9 tweets • 3 min read
Google just dropped a new Generative AI Python library for SQL Databases.

Introducing Google GenAI Toolbox.

This is what you need to know: Image 1. Meet the Google GenAI Toolbox

An open-source server designed to simplify building Gen AI tools for your databases. It streamlines development, letting you integrate powerful data tools with just a few lines of code. Image
Oct 2 • 9 tweets • 4 min read
Stop doing Customer Segmentation with plain vanilla Scikit Learn.

Add these 7 Python libraries to your RFM, clustering, and
customer segmentation projects: Image 1. Data preparation

- load data with pandas
- impute/mask with Feature-engine

Website: feature-engine.trainindata.com/en/latest/inde…Image
Sep 29 • 10 tweets • 3 min read
🚨 BREAKING: Microsoft launches a free Python library that converts ANY document to Markdown

Introducing Markitdown. Let me explain. đź§µ Image 1. Document Parsing Pipelines

MarkItDown is a lightweight Python utility for converting various files to Markdown for use with LLMs and related text analysis pipelines. Image
Sep 28 • 9 tweets • 3 min read
Tableau is about to die.

Introducing PandasAI, a free alternative for fast Business Intelligence.

Let's dive in: Image 1. PandasAI

PandaAI transforms your natural language questions into actionable insights — fast, smartly, and effortlessly.
Sep 24 • 9 tweets • 3 min read
🚨McKinsey just dropped how to build agentic AI (that works)

Here's everything you need to know in 2 minutes: Image 1. Stop building agents; Start fixing workflows

The mistake every organization makes: falling in love with your new AI agent.

The solution: Identify the pain points in your process. Then use agents to connect analytics and gen AI into 1 seamless process.
Sep 24 • 6 tweets • 3 min read
🚨The AI agent handbook

Google just dropped a 46-page playbook on how to build and use agents.

This is what you need to know (and how to get it 100% free): Image The AI agent handbook contains:

- 10 ways to use AI agents
- How to set them up with Google Agentspace
- 100+ prompts for Agentspace
Sep 22 • 12 tweets • 4 min read
Principal Component Analysis (PCA) is the gold standard in dimensionality reduction.

But PCA is hard to understand for beginners.

Let me destroy your confusion: Image 1. What is PCA?

PCA is a statistical technique used in data analysis, mainly for dimensionality reduction. It's beneficial when dealing with large datasets with many variables, and it helps simplify the data's complexity while retaining as much variability as possible.
Sep 21 • 9 tweets • 4 min read
🚨 Google published a 69-page prompt engineering masterclass.

Here are the biggest takeaways every AI user must know: Image Table of Contents:

- Prompt Engineering
- LLM Output Configuration
- Prompting Techniques
- Best Practices Image
Sep 18 • 6 tweets • 2 min read
ROC and AUC are important concepts for evaluating classification models in business (e.g. lead scoring).

In 3 minutes, I'll demystify AUC. Image 1. ROC Curve:

The ROC curve, which stands for the Receiver Operating Characteristic curve, is a graphical representation used to evaluate the performance of a binary classifier system as its discrimination threshold is varied. Image
Sep 18 • 9 tweets • 3 min read
Stop Prompting LLMs.
Start Programming LLMs.

Introducing DSPy by Stanford NLP.

This is why you need to learn it: Image 1. Why DSPy?

DSPy is the open-source framework for programming—rather than prompting—language models.

It allows you to iterate fast on building modular AI systems.
Sep 16 • 8 tweets • 2 min read
Tableau is about to die.

Introducing PandasAI, a free alternative for fast Business Intelligence.

Let dive in: Image 1. PandasAI

PandaAI transforms your natural language questions into actionable insights — fast, smartly, and effortlessly.
Sep 15 • 11 tweets • 4 min read
RIP Tableau and PowerBI.

Enter Julius AI.

This is what Julius can do: Image 1. The $10 Billion problem with Tableau and PowerBI?

Dashboards are static.

But businesses are dynamic.

That's why I'm so excited about this new tool: Julius AI Image
Sep 14 • 11 tweets • 3 min read
R-squared is one of the most commonly used metrics to measure performance.

But it took me 2 years to figure out the mistakes that were killing my regression models.

In 2 minutes, I'll share how I fixed 2 years of mistakes (and made 50% more accurate models than my peers). Let's go:Image 1. R-squared (R2):

R2 is a statistical measure used in regression models that provides a measure of how well the observed outcomes are replicated by the model, based on the proportion of total variation of outcomes explained by the model.