Boxplots are one of the most useful tools in my Data Science arsenal.
In 6 minutes, I'll eviscerate your confusion.
Let's dive in.
1. What is a boxplot?
A boxplot is a standardized way of displaying the distribution of data based on a five-number summary: minimum, first quartile (Q1), median, third quartile (Q3), and maximum.
2. Invention:
The boxplot was invented in 1969 by John Tukey, as part of his pioneering work in data visualization. Tukey's EDA emphasized the importance of using simple graphical and numerical methods to start understanding the data before making any assumptions about its underlying distribution or applying complex statistical models. The boxplot emerged from this philosophy. Tukey's boxplot was designed to be a quick and easy way to visualize the distribution of data.
K-means is an essential algorithm for Data Science.
But it's confusing for beginners.
Let me demolish your confusion:
1. K-Means
K-means is a popular unsupervised machine learning algorithm used for clustering. It's a core algorithm used for customer segmentation, inventory categorization, market segmentation, and even anomaly detection.
2. Unsupervised:
K-means is an unsupervised algorithm used on data with no labels or predefined outcomes. The goal is not to predict a target output, but to explore the structure of the data by identifying patterns, clusters, or relationships within the dataset.