Database normalization is a technique for organizing database data. Normalization is a systematic approach to decomposing tables to eliminate data redundancy (repetition) and unwanted features such as insert, update, and delete anomalies. It is a multi-step process that puts data in tabular form, removing duplicate data from relationship tables.
Normalization is mainly used for two purposes
,
- to eliminate redundant (useless)
- dependencies make sense, that is, that data is stored logically
data. Ensure that data
.
The following video will give you a good overview of database normalization. If you want, you can skip the video, as the concept is covered in detail, below the video.
Problems without normalization
If a table is not normalized properly and has data redundancy, it will not only consume additional memory space, but also make it difficult to manage and update the database, without facing data loss. Inserting, updating, and deleting Anomalies are very common if the database is not normalized. To understand these anomalies let’s take an example from a Student’s table.
rollnonamebranchhodoffice_tel401AkonCSEMr. X53337402BkonCSEMr. X53337403CkonCSEMr. X53337404DkonCSEMr. X53337
In the table above, we have data for 4 Computer Science students. As we can see, the data for the field branch, hod (Head of Department) and office_tel are repeated for students who are in the same branch at the university, this is Data redundancy.
Insertion anomaly
Suppose that for a new admission, until and unless a student opts for a branch, the student’s data cannot be inserted, or else we will have to set the branch information to NULL.
Also, if we have to insert data from 100 students from the
same branch, then the branch information will be repeated for all those 100 students
.
These scenarios are nothing more than insertion anomalies.
Update anomaly
What if Mr. X leaves college? or is no longer the HOD of the computer science department? In that case, all student records will have to be updated, and if we mistakenly lose any records, it will lead to data inconsistency. This is an update anomaly.
Deletion anomaly
In our student table, two different pieces of information, student information and branch information, are kept together. Therefore, at the end of the academic year, if student records are deleted, we will also lose branch information. This is a deletion anomaly.
Normalization
rule The normalization rules are divided into
the following normal forms: First normal form Second normal form Third normal form
- BCNF
fourth normal form First normal form (1NF) For a table to be in the first normal
form
, you must follow the following 4 rules:
- It must only have single-valued (atomic) attributes/columns.
- Values stored in a column must be from the same domain
- All columns in a table must have unique names
- And the order in which the data is stored doesn’t matter.
.
In the next tutorial, we will discuss about the First
Normal shape in detail. Second Normal Form
(2NF)
For a table to
be in the Second Normal Form, it must
- be in the First Normal Form
- And, it should not have partial dependence.
.
To understand what partial dependency is and how to normalize a table to 2nd normal for, go to the Second
Normal Form tutorial. Third Normal Form
(3NF)
A table is said to be
in the Third Normal Form when, it
- is in the Second Normal Form
- And, it has no transitive dependence.
.
Here is the Third Normal Form tutorial. But we suggest that you first study about the second normal form and then turn to the third normal form.
Boyce and Codd normal form
(BCNF) The Boyce
and Codd normal form is a higher version of the third normal form. This form deals with a certain type of anomaly that is not handled by 3NF. A 3NF table that does not have multiple overlapping candidate keys is said to be in BCNF. For a table
to be in BCNF, the following conditions must be met: R must be in
- 3rd Normal Form
- and, for each functional dependency (X → Y), X must be a super key
.
To learn about BCNF in detail with a very easy to understand example, head over to the Boye-Codd Normal Form tutorial.
Fourth normal form
(4NF)
A table is said to be
in the fourth normal form when it
- is in the normal Boyce-Codd form
- And it has no overvalued dependence.
.
Here is the tutorial of Fourth Normal Form. But we suggest you understand other normal ways before turning to the fourth normal form.