1. Split data using pandas
In the code below, we are splitting the data into a random sample of rows and removing them from the original data after dropping index values.
2. Binning Data
Binning is a technique to group/bin your data into multiple buckets which is very helpful if you dealing with continuous numeric data. In pandas you can bin the data using functions cut and cut. First check the shape of your data i.e no of rows and columns.
3. Slicing using loc and iloc functions
You can do position based and label based slicing using iloc and loc functions respectively.
4. Mean Imputation and Interpolate method
Mean Imputation is a technique in which the missing value is replaced by the mean of available data in the chosen column.
5. Combining Data using Concat and Join
Just like in numpy, pd.concat() function is used for concatenation of Series or DataFrame objects in pandas.
1/ Indexing data frames
Indexing means to selecting all/particular rows and columns of data from a DataFrame. In pandas it can be done using two constructs β
.loc() : location based
It has methods like scalar label, list of labels, slice object etc
.iloc() : Interger based
2/ Slicing data frames
In order to slice by labels you can use loc() attribute of the DataFrame.
1/ DefaultDict
In python, a dictionary is a container that holds key-value pairs. Keys must be unique, immutable objects. If you try to access or modify keys that donβt exist in the dictionary, it raise a KeyError & break up your code execution ( continued..)
2/ (Continued..)To tackle this issue,Python defaultdict type, a dictionary-like class is used.If you try to access or modify a missing key,then defaultdict will automatically create the key & generate a default value for it
A defaultdict will never raise a KeyError ( Continued..)