Victor Maciel

When I first started working with databases, I often heard about “normalization,” but it felt like one of those theoretical concepts that didn’t matter much in real-world applications. However, as I began designing more complex systems, I quickly realized that without proper normalization, databases could become a mess—full of redundant data, inconsistencies, and performance issues.

So, what exactly is database normalization? In simple terms, it’s the process of structuring a database to reduce redundancy and improve data integrity. It follows a set of rules called “normal forms,” which guide how data should be organized. Each normal form builds on the previous one, fixing problems and making the database more efficient.

In this article, I’ll walk you through each level of normalization with practical examples.

First Normal Form (1NF) – Eliminating Duplicates and Ensuring Atomicity

The First Normal Form is all about making sure each column and each row in a table is used correctly. In other words, 1NF requires that each cell in a table holds a single, indivisible value, and that each record (row) is unique. This means no repeating groups of columns and no nested lists in a single column.

For example, if a person has multiple phone numbers, we handle that by creating additional rows or a separate table rather than packing them into a single column. Ensuring 1NF is the foundation of a well-structured database because it fosters clarity and consistency right from the start.

Imagine we have a table to store customer information. Here’s a poor design that violates 1NF:

CustomerID	Name	PhoneNumbers	Address
1	Alice Lee	555-1234, 555-5678	123 Elm Street, Springfield
2	Bob King	555-7890	456 Oak Avenue, Maple Town

Notice how the “PhoneNumbers” column contains multiple numbers in the first row separated by a comma. This breaks 1NF because the phone numbers are not atomic values. To fix this, we could create a new table or additional rows. One simple method is to create another row for each additional phone number:

CustomerID	Name	PhoneNumber	Address
1	Alice Lee	555-1234	123 Elm Street, Springfield
1	Alice Lee	555-5678	123 Elm Street, Springfield
2	Bob King	555-7890	456 Oak Avenue, Maple Town

Now each field holds a single value, and each row is unique. This satisfies the conditions of 1NF.

Second Normal Form (2NF) – Eliminating Partial Dependencies

After ensuring your table meets 1NF, the next step is the Second Normal Form, but only for tables that have a composite primary key. A composite primary key means the table’s primary key is made up of two or more columns. 2NF requires that all non-key attributes in a table are fully functionally dependent on the entire primary key, not just part of it.

Think of it this way: If a table’s primary key is (A, B), then each column that isn’t part of the key should depend on both A and B for its value, rather than just A or just B. If you find that a non-key column depends on only one part of the composite key, you’ve got a partial dependency, which is what 2NF aims to eliminate. A classic symptom of a partial dependency is data repetition. For example, if you have a table combining “orders” and “products,” you might store product details in the same table. If that product detail depends only on the product ID (one part of the key) and not the order ID, you’re violating 2NF.

Let’s say we have a table called OrderDetails that tries to store all details about an order and the product in the same place. Its primary key might be a combination of OrderID and ProductID:

OrderID	ProductID	ProductName	Quantity	Price	Supplier
1001	5001	Blue Widget	5	3.99	WidgetCo
1001	5002	Red Widget	2	4.99	WidgetCo
1002	5001	Blue Widget	3	3.99	WidgetCo

Notice that ProductName and Supplier really only depend on ProductID, not the combination of (OrderID, ProductID). That means they’re partially dependent on the primary key. To satisfy 2NF, we remove ProductName and Supplier to a separate Products table where ProductID is the primary key. OrderDetails would then store only data specific to the combination of OrderID and ProductID, like Quantity and Price (if price can vary per order or date). The new design might look like this:

Products Table

ProductID	ProductName	Supplier
5001	Blue Widget	WidgetCo
5002	Red Widget	WidgetCo

OrderDetails Table

OrderID	ProductID	Quantity	Price
1001	5001	5	3.99
1001	5002	2	4.99
1002	5001	3	3.99

This separation eliminates the partial dependency and ensures that any information about the product is fully dependent on its own primary key in the Products table.

Third Normal Form (3NF) – Eliminating Transitive Dependencies

Once your tables meet 2NF (or 1NF if you don’t have composite keys), it’s time to aim for the Third Normal Form. 3NF states that every non-key column must depend directly on the primary key, and not on any other non-key attribute. This means no transitive dependencies, where a column depends on another column that isn’t the primary key.

I like to think of 3NF as the final push to make sure data is stored only once, in the right table. With 3NF, if an attribute can be derived from another attribute, or if it logically belongs to a different entity, that data should be moved. This helps avoid anomalies where updating one attribute might require updating multiple rows or tables. For example, in an “employees” table, if you store the department name along with the department ID, but the department name also depends on the department ID (not the employee), that’s a transitive dependency. You’d want to store department name in a separate “departments” table to satisfy 3NF.

Suppose we have an Employees table:

EmployeeID	EmployeeName	DepartmentID	DepartmentName	Location
1	Sarah Lopez	10	Accounting	Building A
2	John Smith	20	Marketing	Building C
3	Raj Verma	10	Accounting	Building A

Here, DepartmentName depends on DepartmentID, not on EmployeeID. This is a transitive dependency because the primary key is EmployeeID, and DepartmentName isn’t directly related to EmployeeID; it’s related to DepartmentID. In 3NF, we move that department information to its own table.

Departments Table

DepartmentID	DepartmentName	Location
10	Accounting	Building A
20	Marketing	Building C

Employees Table

EmployeeID	EmployeeName	DepartmentID
1	Sarah Lopez	10
2	John Smith	20
3	Raj Verma	10

Now Employees.DepartmentID references the department, and the name and location of the department are stored in Departments. This removes the transitive dependency and satisfies 3NF.

Boyce-Codd Normal Form (BCNF) – Handling More Advanced Anomalies

BCNF, or Boyce-Codd Normal Form, is often considered a stronger version of 3NF. While 3NF focuses on transitive dependencies, BCNF addresses any functional dependency issues that involve candidate keys. In BCNF, every determinant must be a candidate key. A determinant is a column or set of columns that determines another column. So if X → Y is a functional dependency, then X has to be a candidate key in BCNF.

When I learned about BCNF, I found it often comes into play with more complex data relationships, especially when a table has multiple candidate keys. Most tables that are in 3NF will also be in BCNF, but there can be tricky edge cases. This might happen if your table has multiple candidate keys and a functional dependency creeps in that’s not directly tied to the primary key. Ensuring BCNF means carefully analyzing all dependencies in your database schema and making sure they align with the keys.

Let’s consider a scenario with course scheduling at a university, where a table called CourseSchedule might look like this:

CourseID	Instructor	Room	TimeSlot
CS101	Dr. Brown	Room A	9:00 AM
CS101	Dr. Brown	Room A	10:00 AM
CS202	Dr. Green	Room B	9:00 AM

Assume we have the following functional dependencies:

(CourseID, TimeSlot) → Instructor
Instructor → CourseID (Let’s say each instructor can only teach one course at a time)
(Room, TimeSlot) → CourseID (Because a specific room and timeslot can only hold one course)

Here, we might have a situation where (Instructor → CourseID) makes Instructor a determinant, but Instructor may not be a candidate key if we also consider that multiple instructors could theoretically exist with different schedules. If the database is set up such that each instructor can only teach one course, Instructor becomes a candidate key for CourseID. But if it’s not strictly enforced, we could run into a BCNF violation.

To satisfy BCNF, we might reorganize the tables to ensure that every determinant is a candidate key. For example, create separate tables ensuring that instructor-to-course relationships are enforced in a dedicated relationship table, while the schedule table references those constraints. This could look like:

Instructors Table

InstructorID	InstructorName	CourseID
1	Dr. Brown	CS101
2	Dr. Green	CS202

CourseSchedule Table

CourseID	Room	TimeSlot
CS101	Room A	9:00 AM
CS101	Room A	10:00 AM
CS202	Room B	9:00 AM

By breaking out the relationships carefully, we ensure each determinant is indeed a candidate key in its own table, satisfying BCNF.

Fourth Normal Form (4NF) – Eliminating Multivalued Dependencies

4NF deals with multi-valued dependencies. Once you’ve reached BCNF, you likely have eliminated most common anomalies. However, 4NF ensures that there are no non-trivial multi-valued dependencies of a non-key attribute on the primary key. A multi-valued dependency means that for a given key, you can have multiple values of one attribute independent of multiple values of another attribute.

4NF becomes relevant when handling databases that store multiple independent sets of information about an entity. For example, you might have a table that stores a teacher’s multiple certifications and the multiple languages they speak. Each of these sets of values is independent (the languages they speak do not depend on the certifications they hold). When these multi-valued attributes coexist in the same table, you risk duplicating data in complicated ways.

By achieving 4NF, you ensure that these attributes are broken out into separate tables if they’re truly independent. This keeps your main entity table from blowing up with every combination of multi-valued attributes.

Let’s say we have a Teachers table that stores each teacher’s ID, the subjects they can teach, and the sports they can coach. Hypothetically, a teacher can teach multiple subjects and also coach multiple sports. If we put that all in one table:

TeacherID	Subjects	Sports
T1	Math, Science	Basketball, Soccer
T2	English	Baseball, Soccer

We can end up with complex combinations if we try to store them properly. A more normalized design would be to separate these multi-valued attributes into their own tables:

Teachers Table

TeacherID	TeacherName
T1	Mr. Davies
T2	Ms. Roberts

TeacherSubjects Table

TeacherID	Subject
T1	Math
T1	Science
T2	English

TeacherSports Table

TeacherID	Sport
T1	Basketball
T1	Soccer
T2	Baseball
T2	Soccer

Now, the multi-valued attributes (subjects and sports) are each in their own table, removing any multi-valued dependency issues and satisfying 4NF.

Fifth Normal Form (5NF) – Resolving Join Dependencies

5NF, also known as Project-Join Normal Form (PJNF), is the highest commonly recognized normal form. It states that every join dependency in a table is implied by the candidate keys. In simpler terms, 5NF ensures that if you break your data down into multiple tables, you can’t lose information when you recombine (join) them, and there’s no unnecessary duplication when you do.

This level of normalization is often about handling complex relationships between three or more tables. The typical scenario is when you have multiple many-to-many relationships that intersect. For example, a table might connect multiple entities (like teachers, students, and courses) all at once. If the relationships among these entities are truly separate, they should be broken into smaller tables so that you don’t end up with partial or redundant data. 5NF is quite advanced, and not every database needs to reach this level. But if you find yourself managing very large, interconnected datasets, 5NF can be a lifesaver in ensuring your data integrity stays intact.

Consider a scenario with three entities: Company, Product, and Distributor. You might have a single table that attempts to store which distributor handles which product for which company:

CompanyID	ProductID	DistributorID
C1	P1	D1
C1	P2	D2
C2	P1	D1

Let’s say:

A product can be offered by multiple distributors.
A distributor can handle products from multiple companies.
A company can contract multiple distributors for multiple products.

If these three sets of relationships (Company-Product, Company-Distributor, Product-Distributor) are truly independent, you can decompose the table into three separate tables: CompanyProduct, CompanyDistributor, and ProductDistributor. Each table would store only the relationships between its two entities. For example:

CompanyProduct Table

CompanyID	ProductID
C1	P1
C1	P2
C2	P1

CompanyDistributor Table

CompanyID	DistributorID
C1	D1
C1	D2
C2	D1

ProductDistributor Table

ProductID	DistributorID
P1	D1
P2	D2

When you join these three tables, you can still recover the original relationships. If this decomposition is valid (i.e., you don’t lose any information or create spurious tuples), and each of these relationships is truly independent, then your tables could be in 5NF. This means all join dependencies can be derived from the candidate keys without introducing redundant or contradictory data.