So, people who called me names here is a test for you. You need to use python.
- You have 100k CSVs in a folder.
- Read all files in the folder
- Combine them in a single CSV
- Save the combined file for feature engineering using pandas
- All files share the same header
1/4
where do I find 100k CSVs in a folder? Well, in many scenarios and real-life situations. I have made it easy for you: github.com/abhishekkrthak…
Those who called me names must use pandas.
Those who are willing to learn, scroll below.
2/4
Using pandas, in a simple way, took 120 seconds to do this for me. Using pure python took 5.5 seconds, using pypy took 3.8 seconds. That's why it's important to learn the basics too.
After that, ill use pandas for feature engineering. You don't need a bazooka to kill a fly
3/4
The solution comes in the evening if the people who called me gatekeeper can't post the solution.
P.S. I don't care if you hate my ways, but beginners learn something new all the time and that's what matters to me.
4/4
Not surprised that none of the nay-sayers were not able to respond. That's what happens when you start accusing and abusing someone without understanding the context.
- Divide the training data into folds.
- Train a bunch of models: M1, M2.....Mn.
- Create full training predictions (using out of fold training) and test predictions using all these models. 2/4
- Till here it is Level – 1 (L1).
- Use the fold predictions from these models as features to another model.
This is now a Level – 2 (L2) model.
- Use the same folds as before to train this L2 model. 3/4
Have you had troubles or having troubles arranging your machine learning projects? This thread should give you some idea on how to arrange machine learning / deep learning projects. See the folder structure: 1/6 🔽
input/: This folder consists of all the input files and data for your machine learning project. If you are working on NLP projects, you can keep your embeddings here. If you are working on image projects, all images go to a subfolder inside this folder. 2/6
src/: We will keep all the python scripts associated with the project here. If I talk about a python script, i.e. any *.py file, it is stored in the src folder. 3/6
In this thread, I will show you how to train a deep learning based sentiment classification model using BERT. First, you need the IMDb dataset, you can grab it from kaggle.com/lakshmi25npath…. Save this as imdb.csv. 👇
1/5