To do this in a business setting, you’ll typically talk with stakeholders, business partners, and other team members who are familiar with the subject of the analysis.
[4/42]
When you do a data analysis, you need to understand what’s *driving* the analysis.
– What was the overall performance, and how did it compare to the goal (assuming a goal existed)
– Which teams or individuals performed the best?
– Where was performance the best (what region, city, work site)
– When did performance change?
[9/42]
Once you understand the analytical objectives and you have a set of initial questions, you need to identify the relevant data that you’ll need.
If you want to understand sales, then you’ll obviously need “sales” data.
And you’ll need other variables that will help you filter and slice your data.
[11/42]
So, if you are curious about sales performance by *team*, you’ll need team level data (which will probably include a “team” categorical variable.
[12/42]
If you want to know about performance by city or region, you’ll need variables for city and region.
At this stage, you’re just trying to figure out what you need.
[13/42]
The next step in a data analysis: you need to get the data.
Getting data can be complicated, because the various variables you need are often in different places and sometimes in different formats.
The next data analysis step: you need to clean and prepare the data.
This often involves:
– dealing with missing values
– recoding categories
– reshaping datasets from wide to long format, or from long to wide (i.e., melt and pivot)
– joining multiple datasets together
[18/42]
Data preparation is complicated, and I could write thousands of words on that topic alone. So, I’ll need to explain it in depth another time.
(Although, I’ve written about it here on Twitter before)
[19/42]
Once you get to this point, you should have a working dataset that has all of the variables and data that you need to accomplish the objectives of the analysis.
[20/42]
So next, you begin to actually analyze the data using charts, graphs, and data aggregations.
So first, you just want to get an overview of your data. That typically means plotting single variable charts like histograms & density plots, or bi-variate charts like barplots, scatterplots, etc.
Some of these initial charts may answer some questions.
So instead of just using simple histograms and scatterplots, you might create small-multiple versions of those, that break those charts out by an additional variable.
Or you might filter down to a specific subset and *then* plot your data.
[27/42]
At every step of this phase of the analysis, you’re looking for:
– anything that might answer one of your initial questions
– anything else that’s strange or amiss or interesting
If you find something new that’s interesting, write down a note.
These things often provoke new questions that you can answer with your data. They may yield important insights or they may require broader investigation by your team.
Again though: as you visualize, slice, and filter your data, you’re looking for things that help answer important questions, or things that would be valuable to your partners
Initially, you’ll do this informally with your immediate managers, team members, and business partners.
This may be as simple as a 10 minute meeting where you call a person over to your desk, or it might be a 30 minute meeting to review initial findings.
[33/42]
During these initial reviews, your partners might have new questions or raise new issues.
If that’s the case, you may need to do more work, which often requires you to go back to a previous step in the process (i.e., get more data, clean it, plot)
[34/42]
At some point though, the analytical results begin to solidify and you can package them up into a more formalized format.
In a business setting, this is almost always a PowerPoint presentation (or Keynote, if you use Apple software).