Pandas
- Pandas is probably one of the most powerful and flexible open source data analysis and manipulation tool available in any language.
- It provides a wide range of functions for data wrangling and cleaning.
resources:
1⃣ youtube.com/playlist?list=…
2⃣
NumPy (Numerical Python)
- NumPy is an open source project aiming to enable numerical computing with Python.
- It provides functions and methods for performing high level mathematical functions on multi-dimensional arrays and matrices.
resources:
1⃣
Matplotlib (Data Visualization)
- Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python.
Seaborn (Data Visualization)
- Seaborn is a Python data visualization library based on Matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics.
resources:
The previous Github repository.
Scikit-Learn
- Scikit-Learn is an open source machine learning library.
- It is built upon SciPy, and it provides a wide range of machine learning algorithms such as regression, classification, clustering, etc.
resources:
1⃣
2⃣
TensorFlow
- An open source end-to-end machine learning library that allows you to develop and train models.
- TensorFlow focuses mainly on training and inference of deep neural networks.
resources:
1⃣
(part 1)
(part 2)
👇👇
2⃣
OpenCV
- An open source computer vision and machine learning library.
- It has a huge number of algorithms for computer vision applications such as object detection, face recognition, movement tracking, etc.
Other libraries include:
SciPy
PyTorch (Similar to TensorFlow)
statsmodel
plotly (for creating dashboards)
XGBoost
That's it for this thread.
If you find it useful, kindly consider retweeting the first tweet.
For more #DataScience and #MachineLearning content, follow me @ammaryh92.
• • •
Missing some Tweet in this thread? You can try to
force a refresh
✍️Introduction
- scikit-learn is one of the most famous python libraries for machine learning.
- scikit-learn allows you to easily build and train machine learning models through its simple and well designed API.
- However, I will try to simplify the API for beginners.
1⃣ Estimators
- The process of learning parameters from input data is called "Estimation", and therefore any object that learns some parameters from data is called an "Estimator".
- The estimation process itself is performed by calling the fit( ) method of any estimator object.
The term Machine Learning sounds mysterious and confusing to a lot of people especially beginners.
In this thread, I will try to explain how does a machine learn, and why do we even need machine learning?
🧵👇
In pre-machine learning era, we had what is called "rule-based systems".
This basically means that we provide a machine with a bunch of instructions on how to perform a certain task.
For example, if we need to write a function that returns the square of a number.
With rule-based system, this is very easy. 1. First we define a function called Square, for example. 2. Square function takes X as an input, where X can be any number. 3. Square function multiplies X by itself (X **2). 4. Square function returns the result to the user.
If you are planning to get into machine learning, then you are likely to use scikit-learn, one of Python's most famous libraries.
In this thread, I will try to break down scikit-learn's API which could be intimidating in the beginning.
🧵👇 #MachineLearning
Estimators
- An estimator is any scikit-learn object that learns some parameters from data.
- All estimators implement "fit()" method to perform the estimation process.
- Estimators can also act as transformers or predictors.
Transformers
- They are estimators which use the parameters that they have learned to transform data.
- All transformers can implement "transform()" method to perform the transformation process.
I know this sounds vague, but it will become clearer after the following example.
Data Science learning roadmap
If you are looking to get into machine learning, I would like to make your life easier by listing some of my favorite resources in this thread 🧵👇 #MachineLearning
1⃣Python is the most common language for data science (even though there are other great languages like R, C++)
This playlist by @JustinMitchel is a great way to start. youtube.com/playlist?list=…
Bias/Variance analysis is one of the most important concepts in machine learning as it allows you to measure and improve the performance of your model.
🧵In this thread, I will discuss the basic idea behind it, so let's dive right in. #MachineLearning#ArtificialIntelligence
1⃣Bias (Underfitting)
What is bias?
In simple terms, bias is the difference between the model predictions and the actual values.
So if a model has a high bias, it means it has underfitted the training data. In other words, the model has a high training set error.
What causes a high bias?
Bias, or underfitting, mainly occurs when the model is too simple to understand the underlying structure of the training data, and so it misses the relevant relations between the input features and target outputs.