The success of all Machine Learning algorithms depends on how you present the data. Every model gets input data and gives us an output. When your goal is to get the best possible output from input,
1/
You need to present the best data to the model. This is a problem that Feature Engineering solves. Feature Engineering refers to the process of using the domain of Knowledge to extract features from raw data.
2/
In other words, Feature Engineering selects the most useful features from our raw data and presents them to our model, whereby we improve the performance of our model.
It means creating a new feature from existing features. For instance, your training data contain the length of the house and the width of the house. You multiply these two features and make a new feature called the area of the house.
5/
2- Transformations:
I leave this part blank and will write a thread in the future.
6/
3- Feature Extraction:
Sometimes we have a huge volume of data, for instance, images, audio and etc. In this case, our model cant directly uses our data, so we use Feature Extraction. Feature Extraction is a process of automatically reducing the dimensionality of these data.
7/
Feature extraction methods include cluster analysis, text analytics, edge detection algorithms, and principal components analysis (PCA).
4- Feature Selection:
In every dataset, there are only a few features that are useful to build a model and other features are
8/
redundant or irrelevant. If we apply a dataset with all those features to our model, it will reduce the accuracy of our model. In this case, we should determine these important features and delete other features. We do this with the help of Feature Selection.
9/
Let's assume you have three Features(age, height, salary) in your example.
The first feature varies from 1 to 90. The second one varies from 120 to 210 and the Third one varies from 1000 Euro to 4500 Euro. #Thread
1/
As you can see the value of your features are in a different range. In this case, if you want to use gradient descent to find optimum parameters for your model( for instance linear regression), that leads to a slow speed of your model to converge. In this case,
/2
you can utilize Feature Scaling to bring the value of features in a range from 0 to 1 depending on the Scaling technique, that you use. So you improve the speed of your model convergence.
3/
As I started to learn Data Science I didn't know what skills should I learn and where. That was a ton of content and I didn't know which one should I take. I have read more than a hundred articles and talk with some of my data scientist friend and gathered experience
1
during my journey. I want to share a roadmap and skills that you need as a junior Data Scientist and resources to learn.
1- Start with a language programming and best of all Python. You can learn Python from 3 resources.
Taking one of these courses is enough.
2
A- 2022 Complete Python Bootcamp From Zero to Hero in Python by Jose Portilla in Udemy.
Jose Portilla is my favorite instructor. This Course has a GitHub repo where you can access Codes there.