Tools like #chatgpt & github #copilot can help debug complex code and replace Googling + Stack Overflowing for common scripting.
Key skill: ChatGPT prompting (more on this in my free ChatGPT for Data Scientists)
2. Code Quality & Documentation
Great products have great documentation. AI can help produce documentation, comment code, and replace time-consuming manual documentation with automated AI docs.
These 7 statistical analysis concepts have helped me as an AI Data Scientist.
Let's go: 🧵
Step 1: Learn These Descriptive Statistics
Mean, median, mode, variance, standard deviation. Used to summarize data and spot variability. These are key for any data scientist to understand what’s in front of them in their data sets.
2. Learn Probability
Know your distributions (Normal, Binomial) & Bayes’ Theorem. The backbone of modeling and reasoning under uncertainty. Central Limit Theorem is a must too.
Google just released LangExtract: Open-source. Free. Better than $100K enterprise tools.
Here’s what it does: 🧵
What it does:
→ Extracts structured data from messy text
→ Grounds every field to the exact source location
→ Handles 100+ page docs
→ Generates interactive HTML for verification
→ Works with Gemini + local models
What it replaces:
→ Regex/fragile parsing
→ Custom NER pipelines
→ Expensive extraction APIs
→ Manual data entry