1/ 🧵 Getting started with data science and machine learning.
- the first step is to know what data science and machine learning mean and is this field worth getting into?
- if you ask me, I would say yes. the world is data-centric.
2/ in every industry, data is useful and will be for a long long time.
- if you are a developer, it will be easier for you to get into data science
- but it also means you have to work twice as hard since you are already working
3/ - if you are a fresher or a student, it's much easier for you to spend at least 4 hours a day learning something new
- now a lot of people ask where to start from?
- step 0 will be learning to code.
4/ I prefer python and so does the majority of the industry and research when it comes to machine learning/data science
- then, you can start by googling what is machine learning and reading the Wikipedia page.
5/ then you can try Andrew Ng's Coursera course on machine learning and create an account on Kaggle. Go through the course, work out the exercises and simultaneously, start taking part in a Kaggle competition: for example, the Titanic one.
6/ Also, there is no need to spend more than a few weeks (max 1 month) on this competition.
- a lot of people will tell you that you don't need to know the basics, just learn very high-level libraries, but do not trust them.
7/ start from the course mentioned above but do solve a problem simultaneously.
- These are a few courses you can take a look at:
15/ to be honest, there are many free courses available online that you can go through. it doesn't matter which courses you take. if you plan on just starting a course and solving problems without understanding anything, you will be in big trouble.
16/ if you do not plan on finishing a course, it's better not to start.
If you plan to do MSc, there are many universities that offer degrees in computer science and data science, ai, etc in both Europe and the USA. P.S. in Europe, most universities do not have a tuition fee.
17/ Unfortunately, I don't advise on which universities to go for. All universities are good. It's all up to the student.
18/ In short, start from courses, join Kaggle and do 101 competitions, in the beginning, come up with nice projects for your portfolio, write good quality blogs and articles, do not copy-paste, rather, understand and implement on your own and jump into Kaggle competitions with…
19/ …prize money for advanced learning. if you don't understand something, there are many channels where you can ask questions: StackOverflow, Kaggle discussions, and even on Twitter!
• • •
Missing some Tweet in this thread? You can try to
force a refresh
💥 Did you know that there are problems other than MNIST and iris that you can solve (or try to solve) to learn deep learning and computer vision? Here is a list of my favourite Kaggle competitions to learn deep learning and computer vision from ⬇️ 1/13
🚀 If you are starting with machine learning / deep learning and get a new dataset to work on, either on kaggle or in real-world or just for fun. There are a few things you must always take care of to squeeze the most out of your model and make it awesome: ⬇️⬇️⬇️
1/6
🔹 Look at the data carefully. Do EDA.
🔹 Look at the targets. See how they are distributed and what kind of problem this is.
🔹 Choose the right metric to evaluate your models
2/6
🔹 Split the data into folds. You can use this for cross validation or for hold out based validation
🔹 Build a first basic model. This is going to be your baseline.
🔹 Now try to improve on the baseline by adding new features
3/6
Not surprised that none of the nay-sayers were not able to respond. That's what happens when you start accusing and abusing someone without understanding the context.
So, people who called me names here is a test for you. You need to use python.
- You have 100k CSVs in a folder.
- Read all files in the folder
- Combine them in a single CSV
- Save the combined file for feature engineering using pandas
- All files share the same header
1/4
where do I find 100k CSVs in a folder? Well, in many scenarios and real-life situations. I have made it easy for you: github.com/abhishekkrthak…
Those who called me names must use pandas.
Those who are willing to learn, scroll below.
2/4
Using pandas, in a simple way, took 120 seconds to do this for me. Using pure python took 5.5 seconds, using pypy took 3.8 seconds. That's why it's important to learn the basics too.
After that, ill use pandas for feature engineering. You don't need a bazooka to kill a fly
3/4
- Divide the training data into folds.
- Train a bunch of models: M1, M2.....Mn.
- Create full training predictions (using out of fold training) and test predictions using all these models. 2/4
- Till here it is Level – 1 (L1).
- Use the fold predictions from these models as features to another model.
This is now a Level – 2 (L2) model.
- Use the same folds as before to train this L2 model. 3/4