Avi Kumar Talaviya Profile picture
Data science and AI | Content writer | ML/community @OmdenaAI, @streamlit and Analytics Vidhya | Sharing insights and ideas at the intersection of data and AI
kevin palmer Profile picture Cyril Profile picture GollyG 💙 Profile picture Viacheslav Varenia 🇺🇦 Profile picture S George Profile picture 5 subscribed
Mar 16 13 tweets 2 min read
Data preprocessing is a crucial step in data-driven decision-making as it involves transforming raw data into a format that is suitable for analysis.

Without such methods it is impossible to transform data and derive useful insights

Learn more👇 1. Data Cleaning:

This involves handling missing or erroneous data. Techniques include imputation (replacing missing values with a sensible estimate), deletion of rows or columns with missing data, or using algorithms that can handle missing data directly.
Jan 8 7 tweets 2 min read
Time series forecasting is crucial for predicting future trends based on historical data.

It is one of the most important topics to learn in data science👇

Here are 5 methods widely used in the field: Image 1️⃣ ARIMA (AutoRegressive Integrated Moving Average):

ARIMA is a powerful and widely-used method that combines autoregression, differencing, and moving averages.

It's effective for stationary time series data, capturing trends and seasonality.
Dec 30, 2023 15 tweets 2 min read
Large language models are growing rapidly as many AI startups have launched new models lately

But do you know some basic facts about LLMs?

If not then here's the thread for you👇 Image 1/ Pre-training:

Large language models are pre-trained on vast amounts of diverse and unstructured data from the internet. During this pre-training phase, the model learns to understand the complexities of language, including grammar, semantics, and context.
Dec 25, 2023 14 tweets 2 min read
Natural language to SQL is one of the most exciting applications of large language models

Here's the step by step guide to build such application👇 1. User Input:

• Users provide natural language queries or requests as input to the application.

• Example: "Retrieve all customers who purchased in the last month." 2.
Dec 23, 2023 10 tweets 2 min read
CRISP-DM stands for Cross-Industry Standard Process for Data Mining.

Conceived in 1996 by leaders in the then-nascent field of data mining—DaimlerChrysler, SPSS, and NCR—it was born out of a need for a standardized data mining procedure that could grow into a business process The Lifecycle of a Data Mining Project The framework is not a linear path but a dynamic, iterative process.

Constantly evolving business requirements and data insights mean that moving back and forth through the stages is common and necessary for success.
Dec 7, 2023 11 tweets 2 min read
Java is the most popular programming language among coders

here are some basics of Java to learn for every programmer👇 1/ Java environment: The Java environment is a platform that allows you to develop and run Java programs.

It consists of the Java Virtual Machine (JVM), the Java Development Kit (JDK), and the Java Runtime Environment (JRE).
Dec 2, 2023 7 tweets 1 min read
OS module of Python is super important to work with directories and files for your project

Let's learn about OS module and its commands👇 The os module in Python provides functions for interacting with the operating system. It provides a portable way of using operating system-dependent functionality.
Nov 17, 2023 10 tweets 2 min read
Data analytics has many aspects like web, mobile, and social media analysis.

Millions of users visit various platforms every day and it is super important to monitor traffic via various sources to track the ROI and performance of your products

Learn more about these below: Image 1. Web Analytics:

Web analytics refers to the measurement, collection, analysis, and reporting of web data to understand and optimize the usage of a website.
Nov 13, 2023 12 tweets 2 min read
Statistical tests are an integral part of research design, data analysis, A/B testing

These tests are necessary for data scientists to be successful in many real-world scenarios

Learn the most important statistical tests in this thread Image 1/ Z-Test for a Population Mean:

• This test is used to determine whether a sample mean is significantly different from a known or hypothesized population mean when the population standard deviation is known.
Nov 7, 2023 12 tweets 4 min read
🔥Excited to introduce YOLO-NAS Pose: A new benchmark in pose estimation for images and videos

Meet YOLO-NAS Pose, the next-gen pose estimation model from Deci. It delivers on both speed and accuracy, reimaging use cases in sports, healthcare, alike

Learn more below👇 Built on YOLO-NAS with a novel pose estimation head, it's optimized via Deci's AutoNAC for peak performance.

Training enhancements and a streamlined post-processing pipeline set new standards for efficiency

Check out the model code and ⭐the repo👇
bit.ly/3MtxTqX
Nov 5, 2023 17 tweets 3 min read
CRISP-DM is a widely used framework for data mining that outlines a structured approach to planning, executing, and evaluating data mining projects.

It is super important to learn step-by-step processes to achieve successful outcomes from the data mining project👇 Image CRISP-DM is divided into 6 phases to achieve the final objective of the project.

The life cycle of a data mining project consists of six phases as we saw in the previous lesson.
Oct 20, 2023 10 tweets 2 min read
Applied analytics is widely used in industry and business use cases.

There are many applications of applied analytics, let's look at each step of the process👇 1. Data Collection:

Applied analytics begins with the collection of relevant data, which can be structured or unstructured.

This data is typically gathered from various sources, such as databases, sensors, websites, and more.
Oct 13, 2023 15 tweets 3 min read
Data scientists should be proficient in creating various types of visualizations, such as line charts, bar charts, scatter plots, heat maps, etc using libraries such as Matplotlib and Seaborn.📊

Drill down the types of visualization for your next analytics project below👇 1. Bar Charts:

Bar charts are one of the simplest and most common ways to visualize data. They are ideal for showing comparisons between categories or displaying discrete data points.

For example, you can use a bar chart to compare sales figures for different products.
Sep 28, 2023 17 tweets 3 min read
Top 10 classification techniques to learn as a data scientist👇 1/ Binary Classification:

• Logistic Regression: A simple linear model that predicts binary outcomes.
• Support Vector Machines (SVM): Finds a hyperplane that best separates data into two classes.
• Decision Trees: A tree-like model that splits data based on feature values.
Sep 4, 2023 19 tweets 3 min read
🤖📚 Beginner's Guide to Machine Learning 📚🤖

Machine learning can seem intimidating to novices.

Check out this thread that breaks down the basics into simple, easy-to-understand explanations Image credit: FORE School of management 1/ What is machine learning?

Machine learning is a field of study where computers learn from data without being explicitly programmed.

It's about creating algorithms that can make predictions or take actions based on patterns and relationships in the data they are trained on.
Jul 30, 2023 13 tweets 5 min read
Think you can't learn data science on your own? 🤔

Then, think again!

Here are the top free resources to master the art of data science📊 Image credit: iStock 1. Python

⁕ Level of skills: Basics to intermediate

⁕ Time allocation: 21-25 Days

⁕ Project Idea: Count down calculator

⁕ Learning outcome: Build python programming skills

⁕ FREE Learning resources:
🔗 w3schools.com/python/default…
Jul 27, 2023 13 tweets 3 min read
Are you looking to add a real-world data science project to your resume??

Building a project is the best way to learn data science and ML skills, here's the article that I wrote with a code repo to help you with your resume
Learn more👇 First of all, this article was first published on Analytics Vidhya - India's largest data science blog platform

Click the link below to learn more about this project in depth👇
analyticsvidhya.com/blog/2023/06/c…
May 21, 2023 4 tweets 1 min read
Pyspark is an essential skill to become a big data engineer📊📈

Learn Pyspark using the below YT tutorials at FREE of cost

🧵👇 1/ Pyspark tutorial playlist

youtube.com/playlist?list=…
May 20, 2023 8 tweets 2 min read
Here's an update from week 1 of MLOps Zoomcamp by @DataTalksClub

This week was about introduction to MLOps, setting up an environment, and the MLOps maturity model.

A thread👇 1/ Course overview (Life cycle of ML models)

- ML modeling and Experiment tracking:
This includes ML model training and trekking each and every experiment to compare different models & deploy best model
tracking includes model parameters, accuracy measures, & model versioning
Apr 26, 2023 12 tweets 3 min read
Generative AI is undoubtedly going to be the next tech revolution that will drive jobs in this decade and beyond🚀

Here are the 7 ways GenAI can help you in your data science job🧵👇 1/7 Generative AI tools have become increasingly popular in recent years and can be particularly useful for data science tasks.

With these tools, you can create new data points that can be used to train machine learning models, simulate different scenarios, and more.
Apr 25, 2023 9 tweets 4 min read
Become a superhuman by using these 7 tools to automate most of your data science blog writing workflow (and make your life easier)🤯🚀

A🧵👇 1) ScribeHow

@ScribeHow 's Chrome extension

Record a screen to capture screenshots with their extension & autogenerate step-by-step guides⚡

Embed these guides into Pages to create intuitive docs, served up as webpages in just a few clicks.

🔗 getscribe.how/chrome Image