One concept I struggled to understand as a Data Analyst was DataBase Normalization
I'm going to explain it to you so you don't have to
/π§΅/
What is DB Normalization?
Database normalization is a process that aims to organize data efficiently while maintaining data integrity and flexibility.
It is achieved through a series of progressive stages, known as Normal Forms (NF), each building upon the previous one.
Let's delve into the different levels:
1. First Normal Form (1NF):
At this initial level, a table is considered to be in 1NF if it has no repeating groups, and all the entries in each column are atomic (indivisible).
This ensures that each piece of data is stored in its most granular form.
2. Second Normal Form (2NF):
A table reaches 2NF when it meets 1NF criteria, and all non-key attributes are fully functionally dependent on the entire primary key.
This eliminates partial dependencies and enhances data consistency.
3. Third Normal Form (3NF):
In 3NF, the table is further refined by removing transitive dependencies.
This means that non-key attributes should not depend on other non-key attributes.
Achieving 3NF minimizes data redundancy and increases data integrity.
4. Boyce-Codd Normal Form (BCNF):
BCNF takes 3NF a step further by addressing issues related to superkeys and candidate keys.
In BCNF, for any non-trivial functional dependency, the left-hand side must be a superkey.
5. Fourth Normal Form (4NF):
4NF deals with multi-valued dependencies, which occur when one or more attributes have multiple values for a single combination of values in other attributes.
Achieving 4NF ensures further elimination of data anomalies.
5. Fifth Normal Form (5NF) or Project-Join Normal Form (PJNF):
The highest level of normalization, 5NF, addresses cases where a table contains join dependencies.
This level is less common and is primarily applied in complex database scenarios.
It's important to note that not every database needs to achieve the highest level of normalization.
The choice of the appropriate level depends on the specific requirements and use cases of the database.
Striking the right balance between normalization and performance is a critical aspect of effective database design.
Want to learn more topics like this?
I pick a topic and write about it in an easy-to-understand way in my Newsletter every week!
Data Analytics can be divided based on the 5 types of questions it can answer
/π§΅/
Descriptive analytics
What happened?
Descriptive analyticsΒ answers questions about what happened.
Descriptive analytics techniques summarize large datasets to present insights to stakeholders.
The presentation of data related to those KPIs is descriptive analytics.
Diagnostic analytics
Why did it happen?
Diagnostic analyticsΒ helps answer questions about why things happened and is the next step in data analytics after descriptive analytics.
Analysts take findings from descriptive analytics and dig deeper to find the cause.
Diagnostic analytics generally occurs in three steps:
Identify anomalies in the data.
Collect data that are related to these anomalies.
Use statistical techniques to discover relationships and trends that explain these anomalies.