1. Business Understanding: What is the problem that we are trying to solve?
- We should have clarity of what is the exact problem we are going to solve.
- Asking the right questions as a Data Scientist starts with understanding the goal of the business.
2. Analytical Approach: How can we use data to answer the question?
We should decide the analytical approach to follow which can be of 4 types
- Descriptive
- Statistical
- Predictive
- Prescriptive
and it indicates the necessary data content, formats & sources to be gathered
3. Data requirements: What data do we need to answer the question?
- Identifying the necessary data contents, format, & sources for initial data collection.
-During the process, one should find the answers to questions like ‘what’, ‘where’, ‘when’, ‘why’, ‘how’ & ‘who’.
4. Data Collection: Where is the data coming from (identify sources) and how will we get it?
- In this stage, the data requirements are revised and decisions are made as to whether or not the collection requires more data.
5. Data Understanding: Is the data that we collected representative of the problem to be solved?
- Data understanding comprises all the activities related to constructing the data set.
6. Data Preparation: What additional work is required to manipulate & work with the data?
This phase shapes the data into a state where it may be easier to work with. Involves Data cleansing:-
- Missing Data
- Invalid Data
- Remove Duplicates
- Formatting
- Feature engineering
7. Modeling - Answers two key questions:
- What is the purpose of data modeling?
- What are the characteristics of this process?
It focuses on developing models that are either descriptive or predictive. The choice of model is based on the approach chosen in step 2.
8. Evaluation: Does the model used, answer the initial question, or need to be adjusted?
It undergoes:
- The Diagnostic Measures: the model works as intended and where are modifications required
- The Statistical Significance: ensures proper data handling and interpretation
9. Deployment: Can we put the model into practice?
As the model is effectively evaluated it is made ready for deployment in the business market. The deployment phase checks how much the model can withstand in the external environment and perform superiorly as compared to others
10. Feedback:
Feedback is the necessary purpose that helps in refining the model and accessing its performance and impact. Steps involved in feedback define the review process, track the record, measure effectiveness, and review with refining.
That's it for this tread 👋
A retweet for the first one would really mean a lot 🙏
If you liked my content and want to get more threads on Data Science, Machine Learning & Python, do follow me @PiyalBanik
• • •
Missing some Tweet in this thread? You can try to
force a refresh
This book deals with manipulating, processing, cleaning, and crunching data in Python. It is about the parts of the Python language and libraries you’ll need to effectively solve a broad set of data analysis problems.
1) SL has a feedback mechanism.
UL has no feedback mechanism.
2) Supervised learning involves building a model for predicting, or estimating.
In unsupervised learning, we can learn relationships and structures from data
-regularization
-simpler model architecture
-more training data
-reduce noise in the data
-reduce the number of input attributes
-shorter training cycles
Few things to keep in mind before starting
- Learn By Doing, Practicing & Not Just Reading
- Code By Hand [very effective]
- Share, Teach, Discuss and Ask For Help
- Use Online Resources
- Be consistent
- Learn to Use Debugger
I have done all the below-mentioned concepts as part of the #100DaysOfCode challenge and the code can be found in my @github profile.
[Projects & exercise not done. let me know if you want the solutions]
Since we're currently in July, so start from this month.
Understanding Data Science and getting started with Python
- what is data science?
- what does a data scientist do?
- find out various resources
- Set up the system
- Learn Python basics
- Introduction to Pandas & Numpy
August -
Mathematics, Statistics & SQL
- Linear Algebra
- Introduction to Probability
- Statistics - inferential & descriptive
- Exploratory Data Analysis
- SQL for Data science
- Projects on EDA and SQL
Start engaging in the Data Science & Machine Learning community