Database Normalization
Here we will discuss four types of Normalization and some important concepts like candidate keys, Prime & Non-Prime attributes
Database Normalization is a technique to remove or reduce redundancy from the tables in the database.
Types of Redundancy
There are two types of redundancies that can occur in database tables.
Row Level Redundancy
In this type of redundancy, there is a repetition of the same row in the same table. For example, a table name as 'STUDENT' in which data of students is entered, if we enter the same student's data multiple time then, in this case, Row Level redundancy occurs as shown below:
In the above case, there is a repetition of a student data whose name is 'Haider', this is what we called row-level redundancy.
The solution of Row Level Redundandancy
This Row-level Redundancy can be corrected by setting a primary column in the table. In this case, we set SID as the primary key column (such column must contain unique & not null values).
Now, this is the Redundancy free table.
Column Level Redundancy
This type of redundancy occurs when the data of the different entities assemble in a single table. For example in the case of a university's database, if we assemble the information of entities like 'Student', 'Course' and 'Faculty' in a single table then Column Level Redundancy occurs as shown below:
Delete From Student Where SID=1;
The query will be executed successfully but the problem is that along with student data all other important data related to 'Course' & 'Faculty' will also be deleted by the SQL server.
Insertion Anomaly
If we want to insert anything in the given table:
Let suppose we want to introduce a new course having CID=C4 & CName='Programing Fundamental' in the given table, we write the following query:
Insert Into Student Values (C4, 'Programing Fundamental');
But at the moment not a single student is enrolled in this course & we also don't know about the faculty related to that course. So, there is a redundancy in this table.
Update Anomaly
If we want to update anything in the given table:
Suppose we want to update the faculty name having FID= 'F1' from FName= 'Tanveer' to 'Sadiq' in the given example, we write the following query:
Update student set FName = 'Sadiq' where FID = 'F1';
The query will be executed successfully, But! as you notice the repetition of FID = 'F1', FName will be updated from 'Tanveer' to 'Sadiq' every time where FID = 'F1' and it will create a problem when data is very large i.e it is not time efficient.
The solution of Row Level Redundandancy
The solution of column-level redundancy is that we divide this single table into three separate tables as shown below. By doing so all the anomalies can be resolved.
Some Important Aspects
Candidate key
Keys that are able to determine the whole table it can be a single column or combination of more than one columns. From candidate columns, one is selected as a primary column and all others are known as alternative columns.
Prime Attribute
Attributes that help to determine the candidate keys.
Non-Prime Attribute
All the attributes other than Prime attributes are known as Prime Attributes.
Example:
Composite Key
When two or more columns are combined to uniquely determine the whole table, it forms a composite key.
Types of Normalization
There are four types of normalizations:
First Normal Form (1NF)
- The table should not contain any multivalued attribute
let's take an example to understand this point.
The problem in the table
It is possible that a student may have more than 1 phone no.
But here in this case representation is wrong so we can say that this table is not in First normal form (1NF).
Normalize to First normal form (1NF)
For normalizing the above table to 1NF here are some methods shown below:
- In 1st case, we create two different attributes or columns of 'Phone no' for the sake of Database Normalization as shown:
- In 2nd case, we divide a single table into two different tables & relate them by using a foreign key as shown: In this case, we create a separate table for student's phone nos and form a composite key by joining two columns 'Roll#' & 'Phone no', 'Roll#' as a foreign key gets its references from 'Student' table.
Second Normal Form (2NF)
- The table must be in First Normal Form.
- There should be no partial dependency in the relation.
In this case, 'CustomerID' & 'StoreID' depends on each other i.e both are Prime attributes or we can say that these two attributes are Candidate keys.
The problem in the table
The first condition of 2NF is satisfied that is 'Table should be in 1NF' but the problem arises in the second condition that is described as follow:
The column or attribute name as 'Location' is fully dependent on 'StoreID' & 'StoreID' is fully dependent on 'CustomerID' so 'Location' is partially dependent on 'CustomerID' and this the violation of Second normal form (2NF).
Let's understand this point diagrammatically:
This purely the violation of Second normal form.
Normalize to 2NF
Simply, we divide the single table into two separate tables to normalize the table to 2NF as shown below:
Now both tables are in Second normal form (2NF). In the case of the first table name as 'Store', 'Location' (act as a non-prime attribute) fully dependent on 'StoreID' (act as a prime attribute). So, we can say that it is in 2NF. In the case of the second table name as 'Customer', there is no non-prime attribute, both are prime-attributes and dependent on each other.
Third Normal Form (3NF)
- The table should be in 2NF.
- Any non-prime attribute should not be determined by non-prime attribute.
In this case, 'Roll#' is a candidate key & prime attribute and set as a primary key, on the other hand 'State' & 'City' are non-prime attributes. 'State' & 'City' are non-prime attribute because these attributes are not helpful in selecting a candidate key that is 'Roll#' in this example.
The problem in the table
Now come toward the point that is this table in 3NF?
In this case, Candidate Key = C.k = 'Roll#', Prime attribute = 'Roll#', Non-Prime attribute = 'State' & 'City'.
Functional dependencies: (Shown below)
This table is not in 3NF because 'state'(non-prime attribute) determines 'city'(non-prime attribute) & this is against the law of 3NF that is (Any non-prime attribute should not be determined by non-prime attribute). In the above fig. first functional dependency is a valid but second functional dependency is not valid according to Third normal form (3NF).
BCNF(Boyce Codd Normal Form)
- The table should be in 3NF.
- L.H.S of each Functional Dependency (FD) should be a candidate key or super key.
Description of table
In this case Candidate key ={Roll#, Reg#}
Functional Dependency(FD): (Shown below)
thank you. it was very helpful
ReplyDeleteInformative blog.. Keep it up
ReplyDeletelaborious work dear. its very helpful, keep up the good work.
ReplyDeleteNice
ReplyDelete