1/ Indexing data frames
Indexing means to selecting all/particular rows and columns of data from a DataFrame. In pandas it can be done using two constructs —
.loc() : location based
It has methods like scalar label, list of labels, slice object etc
.iloc() : Interger based
2/ Slicing data frames
In order to slice by labels you can use loc() attribute of the DataFrame.
Implementation —
3/ Filtering data frames
Using Filter you can subset rows or columns of dataframe according to labels in the specified index of the data.
Implementation —
4/ Transforming Data Frames
Pandas Transform helps in creating a DataFrame with transformed values and has the same axis length as its own.
Implementation —
5/ Adding Rows — append()
Implementation —
6. Hierarchical indexing
Hierarchical indexing is the technique in which we set more than one column name as the index. set_index() function is used for when doing hierarchical indexing.
Implementation —
7/ Merging data frames
Concat() Function is used to merge the dataframes.
Implementation --
8/ Joins —
It helps us merging DataFrames. Types of Joins —
Inner Join :- Returns records that have matching values in both tables.
Left Join :- Returns all the rows from the left table that are specified in the left outer join clause, not just the rows in which the columns match
9/ Right Join :- Returns all records from the right table, and the matched records from the left table.
Full Join :- Returns all records when there is a match in either left or right table.
Cross Join :- Returns all possible combinations of rows from two tables.
Implementation-
10/ Pivot Tables
It creates a Spreadsheet style pivot table as a DataFrame.
Implementation -
11/ Aggregate Functions
Pandas has a number of aggregating functions that reduce the dimension of the grouped object.
count()
value_count()
mean()
median()
sum()
min()
max()
std()
var()
describe()
sem()
Implementation -
12/ I write quality threads on Data Science, Python, Programming, Machine Learning and AI in my free time. If you like this thread, then give a follow.
✅Attention Mechanism in Transformers- Explained in Simple terms.
A quick thread 👇🏻🧵
#MachineLearning #Coding #100DaysofCode #deeplearning #DataScience
PC : Research Gate
1/ Attention mechanism calculates attention scores between all pairs of tokens in a sequence. These scores are then used to compute weighted representations of each token based on its relationship with other tokens in the sequence.
2/ This process generates context-aware representations for each token, allowing the model to consider both the token's own information and information from other tokens.
✅Regularization is a technique used in ML to prevent overfitting and improve the generalization of a model - Explained in Simple terms.
A quick thread 👇🏻🧵
#MachineLearning #Coding #100DaysofCode #deeplearning #DataScience
PC : Research Gate
1/ Regularization is a technique in machine learning used to prevent overfitting by adding a penalty term to the model's loss function. The penalty discourages overly complex models and promotes simpler ones, improving generalization to new, unseen data.
2/ When to use regularization:
Use regularization when you suspect that your model is overfitting the training data.
Use it when dealing with high-dimensional datasets where the number of features is comparable to or greater than the number of samples.
✅XGBoost is a powerful and efficient gradient boosting library designed for ML tasks, specifically for supervised learning problems- Explained in Simple terms.
A quick thread 🧵👇🏻
#MachineLearning #Coding #100DaysofCode #deeplearning #DataScience
PC : Research Gate
1/ XGBoost is ensemble learning method that combines multiple decision trees into a strong predictive model. It builds decision trees sequentially, where each tree corrects errors of previous ones. XGBoost optimizes a differentiable loss function to minimize prediction errors.
2/ When to Use XGBoost:
Use XGBoost when you need a highly accurate predictive model, especially in situations where other algorithms may struggle with complex patterns and relationships in the data.
✅Gradient Boosting is a powerful machine learning technique used for both regression and classification tasks - Explained in Simple terms.
A quick thread 🧵👇🏻
#MachineLearning #Coding #100DaysofCode #deeplearning #DataScience
PC : Research Gate
1/ Gradient Boosting is an ensemble learning method that combines the predictions of multiple weak learners (often decision trees) to create a stronger and more accurate predictive model.
2/ How Gradient Boosting Works:
Gradient Boosting builds an ensemble of decision trees sequentially. It starts with a simple model (typically a single tree) and then iteratively adds more trees to correct the errors made by the previous ones.
✅Cross-validation in ML is particularly useful for estimating how well a model will perform on unseen data - Explained in Simple terms.
A quick thread 🧵👇🏻
#MachineLearning #Coding #100DaysofCode #deeplearning #DataScience
PC : Research Gate
1/ Cross-validation involves splitting the dataset into multiple subsets and using different parts of the data for training and testing at each iteration. The primary goal of cross-validation is to obtain a more robust and unbiased estimate of a model's performance.
2/ Why use Cross Validation -
Performance Estimation: Cross-validation provides a more robust and unbiased estimate of a model's performance. It helps you to obtain a more accurate assessment of how well your model will perform on new, unseen data.
✅Feature selection and Feature scaling are crucial Feature Engineering steps - Explained in Simple terms.
A quick thread 👇🏻🧵
#MachineLearning #Coding #100DaysofCode #deeplearning #DataScience
PC : Research Gate
1/ Feature selection is the process of choosing a subset of the most relevant features (variables or columns) from your dataset. It involves excluding less informative or redundant features to improve model performance and reduce computational complexity.
2/ When to Use It:
High-Dimensional Data: Feature selection is crucial when you have a high-dimensional dataset, meaning there are many features compared to the number of data points. High dimensionality can lead to overfitting and increased computational costs.