Tbone Profile picture
Jun 8 12 tweets 4 min read Twitter logo Read on Twitter
A few days back, 2 Data Analyst newbies showed me mistakes they encountered during their analysis process. It dawned on me that lots of times, we don't diligently follow through the data analysis process, especially the cleaning phase.
In this short🧵, I break down this process. Image
PHASE 1: ASK QUESTIONS
The first step in the analysis process is to formulate a problem to solve or a question to answer, such as: are customers aged 21 to 30 more likely to churn than customers aged 41 to 50? As analysts, we must be very curious, we must ASK QUESTIONS!
PHASE 2: Collect & Store Data
Next, we need to collect and store the necessary data, which could require the use of a database or a spreadsheet. For large datasets, we store them in a database. For datasets shy of a million records, spreadsheet software like Excel can be used.
PHASE 3: Explore Data
Once collected, we need to explore our data to familiarize ourselves with its contents. At this stage, we may find biases in the collected data or discover missing or erroneous entries. This leads us to one of the most important steps
PHASE 4: DATA CLEANING
Once we understand the data, it must be cleaned and prepared for analysis. If our question requires the ages of customers, but we only collected dates of birth, we must convert them at this stage. Image
A well-cleaned and prepared dataset will make analyzing the data much easier. As data analyst, we have heard several times that this is your most important step. I have seen @malcom_okonkwo , @Rita_tyna, @phaibooboo , and a few other analysts tell us of how it took them hours,
days and even weeks to clean their datasets. The simple reason for this thorough process is to ensure your analysis is built on a very solid foundation. Trust me, you really do not want to cost your clients millions just because you didn't thoroughly clean your data.
PHASE 5: ANALYZE DATA
Now onto the actual analysis! Here's where we solve the problem we formulated. The analysis usually involves performing calculations and often a statistical analysis. This is where we employ all the tools required for our analysis.
As a BI analyst, you will work mostly with BI tools like Power BI, Tableau, Looker, Domo, Datapine, and Microstrategy amongst a few. As a Spreadsheet analyst, you will work mostly with Microsoft Excel, Google Sheets. As a programming analyst, you will mostly work with Python, R
or any other programming tool. As a Database analyst, you will work mostly with Relational Database Management Systems like PostgreSQL, MSQL, MySQl, Oracle.
PHASE 6: PRESENT FINDINGS
For the final phase , we present our results to key stakeholders, which often includes high-level findings from the analysis and supporting visualizations.
End of 🧵.
Thank you for sticking with me☺️
#DataAnalytics #DataScience #SQL #PowerBI #Dataanalyst #datafam

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Tbone

Tbone Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @Toby_Davids

Jun 8
#Day5 for Reporting "Crack SQL interview in 50Qs" is hear. Let's rumble

Find the links to the questions here
1. leetcode.com/problems/the-n…
2. leetcode.com/problems/prima…

#SQL #DATAANALYSIS #Interviews #lowcode Image
Let's break them down:
1. Intuition
The query aims to retrieve information about managers in the Employees table, including their IDs, names, the number of employees reporting to them, and the average age of their reports.
The solution involves using a self join. Since we were told that a manager is also an employee, we will join the table to itself, where the employee_id field is equal reports_to field. On this premise, we aggregate the number of employees who report directly to the manager,
Read 6 tweets
Jun 8
Let's break down each question
1. Logic: Show the classes, where the number of student is more than 5. The word "At least" tells us something, we have to include 5 as part of our filtering condition, hence >=5
2. This looks pretty straight forward. For each user, we want to....
..return the number followers, hence the need to use the Group BY clause. This clause allows us to aggregate data against the unique categorical/qualitative values in a field.
3. The given SQL query finds the maximum value (num) from the "mynumbers" table that appears only once.
Read 9 tweets
Jun 6
Let's break down each code
1. The ending part to the question is where the major conditional logic lies. We were told to return the needed columns, and price for the first year of every product sold. Since we do not have distinct column showing min years for each product,
We have to create that with a subquery in the Where clause. We are concerned about the first year of every product sold, so we aggregate the min years for each product. After doing that, we tell SQL to return the prices of products, the year, and qty when the product was first..
Read 10 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(