Future Is Generative AI + Data Science | Helping My Students Become Generative AI Data Scientists & AI Engineers ($200,000+ career) 👇
10 subscribers
Sep 16 • 8 tweets • 2 min read
Tableau is about to die.
Introducing PandasAI, a free alternative for fast Business Intelligence.
Let dive in: 1. PandasAI
PandaAI transforms your natural language questions into actionable insights — fast, smartly, and effortlessly.
Sep 15 • 11 tweets • 4 min read
RIP Tableau and PowerBI.
Enter Julius AI.
This is what Julius can do: 1. The $10 Billion problem with Tableau and PowerBI?
Dashboards are static.
But businesses are dynamic.
That's why I'm so excited about this new tool: Julius AI
Sep 14 • 11 tweets • 3 min read
R-squared is one of the most commonly used metrics to measure performance.
But it took me 2 years to figure out the mistakes that were killing my regression models.
In 2 minutes, I'll share how I fixed 2 years of mistakes (and made 50% more accurate models than my peers). Let's go:1. R-squared (R2):
R2 is a statistical measure used in regression models that provides a measure of how well the observed outcomes are replicated by the model, based on the proportion of total variation of outcomes explained by the model.
Sep 13 • 13 tweets • 4 min read
Understanding probability is essential in data science.
In 4 minutes, I'll demolish your confusion.
Let's go! 1. Statistical Distributions:
There are 100s of distributions to choose from when modeling data. Choices seem endless. Use this as a guide to simplify the choice.
Sep 13 • 10 tweets • 4 min read
🚨 BREAKING: Microsoft launches a free Python library that converts ANY document to Markdown
Introducing Markitdown. Let me explain. đź§µ 1. Document Parsing Pipelines
MarkItDown is a lightweight Python utility for converting various files to Markdown for use with LLMs and related text analysis pipelines.
Sep 8 • 6 tweets • 2 min read
RIP Data Scientists.
The Generative AI Data Scientist is NOW what companies want.
This is actually good news. Let me explain:
Companies are sitting on mountains of unstructured data.
PDF
Word docs
Meeting notes
Emails
Videos
Audio Transcripts
This is useful data. But it's unusable in its existing form.
Sep 4 • 13 tweets • 4 min read
K-means is an essential algorithm for Data Science.
But it's confusing for beginners.
Let me demolish your confusion: 1. K-Means
K-means is a popular unsupervised machine learning algorithm used for clustering. It's a core algorithm used for customer segmentation, inventory categorization, market segmentation, and even anomaly detection.
Sep 3 • 11 tweets • 4 min read
These 7 statistical analysis concepts have helped me as an AI Data Scientist.
Let's go: đź§µ
Step 1: Learn These Descriptive Statistics
Mean, median, mode, variance, standard deviation. Used to summarize data and spot variability. These are key for any data scientist to understand what’s in front of them in their data sets.
Sep 3 • 11 tweets • 3 min read
The 3 types of machine learning (that every data scientist should know).
In 3 minutes I'll eviscerate your confusion. Let's go: đź§µ 1. The 3 Fundamental Types of Machine Learning:
Linear Regression is one of the most important tools in a Data Scientist's toolbox.
Yet it's super confusing for beginners.
Let's fix that: đź§µ 1. Ordinary Least Squares (OLS) Regression
Most common form of Linear Regression. OLS regression aims to find the best-fitting linear equation that describes the relationship between the dependent variable (often denoted as Y) and independent variables (denoted as X1, X2, ..., Xn).
Aug 28 • 7 tweets • 3 min read
Came across this new library for LLM Prompt Management in Python.
This is what it does:
The Python library is called Promptify.
It combines a prompter, LLMs, and pipeline to Solve NLP Problems with LLM's.
You can easily generate different NLP Task prompts for popular generative models like GPT, PaLM, and more with Promptify.
Aug 28 • 8 tweets • 3 min read
🚨BREAKING: New Python library for agentic data processing and ETL with AI
Introducing DocETL.
Here's what you need to know: 1. What is DocETL?
It's a tool for creating and executing data processing pipelines, especially suited for complex document processing tasks.
It offers:
- An interactive UI playground
- A Python package for running production pipelines
Aug 27 • 15 tweets • 3 min read
Understanding P-Values is essential for improving regression models.
In 2 minutes, I'll crush your confusion. 1. The p-value:
A p-value in statistics is a measure used to assess the strength of the evidence against a null hypothesis.
Aug 27 • 9 tweets • 3 min read
Logistic Regression is the most important foundational algorithm in Classification Modeling.
In 2 minutes, I'll crush your confusion.
Let's dive in: 1. Logistic regression is a statistical method used for analyzing a dataset in which there are one or more independent variables that determine a binary outcome (in which there are only two possible outcomes). This is commonly called a binary classification problem.
Aug 25 • 8 tweets • 3 min read
This guy built an entire AI Data Science Team in Python.
100% Open Source
This is how to get it (for FREE) đź§µ 1. What is it?
An AI-powered data science team of agents to help you perform common data science tasks 10X faster.
Aug 24 • 6 tweets • 3 min read
Data scientists are OUT.
The Generative AI Data Scientist is IN.
This is why (and how you can make the transition): đź§µ
Companies are sitting on mountains of unstructured data.
PDF
Word docs
Meeting notes
Emails
Videos
Audio Transcripts
This is useful data. But it's unusable in its existing form.
Aug 22 • 10 tweets • 4 min read
This 277-page PDF unlocks the secrets of Large Language Models.
Here's what's inside: đź§µ
Chapter 1 introduces the basics of pre-training.
This is the foundation of large language models, and common pre-training methods and model architectures will be discussed here.
Aug 20 • 7 tweets • 3 min read
Stop Prompting LLMs.
Start Programming LLMs.
Introducing DSPy by Stanford NLP.
This is why you need to learn it: 1. Why DSPy?
DSPy is the open-source framework for programming—rather than prompting—language models.
It allows you to iterate fast on building modular AI systems.
Aug 19 • 6 tweets • 3 min read
Is data cleaning time-consuming?
This is how I went from 3 hours to 5 seconds:
Data cleaning is one of those parts of the data science process that can take 3+ hours.
So in December, I decided to make an AI agent that cleans data for me.
This is what I made:
Aug 19 • 5 tweets • 2 min read
STOP DOING CUSTOMER SEGMENTATION WITH MACHINE LEARNING.
Start using AI.
This is how:
ML is great for 1 thing: finding clusters.
That's only 33% of the problem.
The other 66% is identifying what those clusters mean (and figuring out how to market to them).