You need to generate your training data at the beginning of every real-world ML project.
Typically, you access an SQL-type database and write a long query that pulls data from several tables, aggregates it, and merges it into the final training set.
The dataset contains a set of features and a target metric you want to predict.
Once you have this data, you are very tempted to train your first ML model.
And this is a BIG mistake.
Instead, you should put a few minutes aside to run a #data exploration