A clear understanding of data and the various types of data is needed. The word “data,” particularly in the context of analytics, is often associated with quantitative data. Quantitative data, however, is just one type of data that is used on a daily basis by decision makers.
Along with quantified data such as box scores and draft-combine results, decision makers use a host of qualitative data. Qualitative data take a variety of
forms, including scouting reports, coach’s notes, and video.
Quantitative Data
Quantitative data are just data, the lowest input into the analytic process, and without being transformed
into information, they are at best useless and can often be misleading.
Just because data are presented in the form of an average or a percentage or a ratio does not mean that it is useable information. Only after raw numerical data are given rich context do they become information that can be used in the decision-making process.
Qualitative Data
Scouting reports, medical reports, video, and other sources are all kept in discrete locations and not combined with quantitative data. In part, this is
because of the nature of qualitative data. Most qualitative data are what is known as unstructured data.
This means there are no distinct variable names and the data cannot not be easily and logically put into a set of rows and columns in a spreadsheet. When data takes the form of words or images, we tend think about and process them differently than we do with quantitative data
Because qualitative data can be unstructured, the differences in handling and processing this kind of data are natural, but this does not mean that quantitative and qualitative data should be strictly segregated.
Raw qualitative data are no more meaningful than raw quantitative data, and they, too, need to be processed and transformed into useable information.
For example, a scouting report from one game may produce several pages of notes—raw data. Before these quantitative data can be useful, they need to be combined with other scouting reports, medical reports, video edits, and other kinds of data that the organization uses.
The general attitude toward qualitative data leads organizations to store them in a more careless manner. Medical data, for example, are rarely
organized and stored with the same care and structure as salary data.
Often the medical staff is the sole arbiter of where and how those data are stored and who may access it. This means most medical data are left unstructured
and are rarely turned into useable information.
That this type of careless data management creates problems is clearly evident through the general
lack of understanding of the long-term effects of injuries on player performance.
Analysis of Unstructured Data
Unstructured data sets often require a significant investment of time in order to create useful information from them. It is possible, though,
to impose structure on these unstructured data in order to reduce the processing time.
For scouting reports, creating a more standardized report that asks for specific grades or ratings in particular areas while still preserving a more free-form comments section can make summarizing that data more efficient and easier to incorporate with other
types of information.
For video data, this can take the form of using play-by-play data or the motion-capture data to make finding, gathering, and organizing specific types of plays or situations more efficient.
The potential downside to imposing structure is that some of the finer points may be squeezed out of the data. A scouting report that is too structured, for example, may not capture some important data from a
player’s performance for which there is no structured field.
These nuances can be important;
thus, when designing the data structure, allowing for flexibility is important. Even if the data are completely unstructured and there is no apparent method for creating a structure, there is a growing set of statistical tools that can be used.
They can process massive amounts of text or other unstructured data and pull out useful information. These tools identify patterns within the text and can then use those patterns in combination with other data to create valuable information.
Eg, if a series of scouting reports on a player seem to be contradictory, text analytics can identify +ve and -ve reports and then use the data from those reports to compare the scouting reports to information from the games, such as start time, weather conditions, home/away etc
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Three principles of data management: standardization, centralization, and integration
These three principles build on one another to create efficiencies and consistencies within the organization that allow for easier and more timely access to information.
Standardization
Standardizing data and data creation and storage within an organization requires knowing the sources of
the data. Some data sources are consistent across all teams. E.g., all teams use video, keep box-score data, and have scouting reports.