Database Normalization

Here we will discuss four types of Normalization and some important concepts like candidate keys, Prime & Non-Prime attributes

Database Normalization is a technique to remove or reduce redundancy from the tables in the database.

Types of Redundancy

There are two types of redundancies that can occur in database tables.

Row Level Redundancy

In this type of redundancy, there is a repetition of the same row in the same table. For example, a table name as 'STUDENT' in which data of students is entered, if we enter the same student's data multiple time then, in this case, Row Level redundancy occurs as shown below:

In the above case, there is a repetition of a student data whose name is 'Haider', this is what we called row-level redundancy.

The solution of Row Level Redundandancy

This Row-level Redundancy can be corrected by setting a primary column in the table. In this case, we set SID as the primary key column (such column must contain unique & not null values).

Corrected Row Level redundancy Database Normalization

Now, this is the Redundancy free table.

Column Level Redundancy

This type of redundancy occurs when the data of the different entities assemble in a single table. For example in the case of a university's database, if we assemble the information of entities like 'Student', 'Course' and 'Faculty' in a single table then Column Level Redundancy occurs as shown below:

In this case, there are 3 types of anomalies that occur when we start working on this database.

Deletion Anomaly

If we want to delete any row from the given table:

Let suppose we want to delete data of student having SID=1 from the given example, we write the following query:

Delete From Student Where SID=1;

The query will be executed successfully but the problem is that along with student data all other important data related to 'Course' & 'Faculty' will also be deleted by the SQL server.

Insertion Anomaly

If we want to insert anything in the given table:

Let suppose we want to introduce a new course having CID=C4 & CName='Programing Fundamental' in the given table, we write the following query:

Insert Into Student Values (C4, 'Programing Fundamental');

But at the moment not a single student is enrolled in this course & we also don't know about the faculty related to that course. So, there is a redundancy in this table.

Update Anomaly

If we want to update anything in the given table:

Suppose we want to update the faculty name having FID= 'F1' from FName= 'Tanveer' to 'Sadiq' in the given example, we write the following query:

Update student set FName = 'Sadiq' where FID = 'F1';

The query will be executed successfully, But! as you notice the repetition of FID = 'F1', FName will be updated from 'Tanveer' to 'Sadiq' every time where FID = 'F1' and it will create a problem when data is very large i.e it is not time efficient.

The solution of Row Level Redundandancy

The solution of column-level redundancy is that we divide this single table into three separate tables as shown below. By doing so all the anomalies can be resolved.

Database Normalized Column level redundancy

Some Important Aspects

Candidate key

Keys that are able to determine the whole table it can be a single column or combination of more than one columns. From candidate columns, one is selected as a primary column and all others are known as alternative columns.

Prime Attribute

Attributes that help to determine the candidate keys.

Non-Prime Attribute

All the attributes other than Prime attributes are known as Prime Attributes.

Example:

Candidate key, prime & Non-prime Attribute

Composite Key

When two or more columns are combined to uniquely determine the whole table, it forms a composite key.

Example:

Types of Normalization

There are four types of normalizations:

First Normal Form (1NF)

The table should not contain any multivalued attribute

let's take an example to understand this point.

The problem in the table

It is possible that a student may have more than 1 phone no.

But here in this case representation is wrong so we can say that this table is not in First normal form (1NF).

Normalize to First normal form (1NF)

For normalizing the above table to 1NF here are some methods shown below:

In 1st case, we create two different attributes or columns of 'Phone no' for the sake of Database Normalization as shown:
Now we can say that this table is in 1NF.
In 2nd case, we divide a single table into two different tables & relate them by using a foreign key as shown:
In this case, we create a separate table for student's phone nos and form a composite key by joining two columns 'Roll#' & 'Phone no', 'Roll#' as a foreign key gets its references from 'Student' table.
In the 3rd case, we Normalize the above example to 1NF without creating any extra table & columns as shown:
In this case, we just create extra rows and set 'Roll#' & 'Phone no' as a composite key.

Second Normal Form (2NF)

The table must be in First Normal Form.
There should be no partial dependency in the relation.

let's take an example to understand the points.

In this case, 'CustomerID' & 'StoreID' depends on each other i.e both are Prime attributes or we can say that these two attributes are Candidate keys.

The problem in the table

The first condition of 2NF is satisfied that is 'Table should be in 1NF' but the problem arises in the second condition that is described as follow:

The column or attribute name as 'Location' is fully dependent on 'StoreID' & 'StoreID' is fully dependent on 'CustomerID' so 'Location' is partially dependent on 'CustomerID' and this the violation of Second normal form (2NF).

Let's understand this point diagrammatically:

This purely the violation of Second normal form.

Normalize to 2NF

Simply, we divide the single table into two separate tables to normalize the table to 2NF as shown below:

Now both tables are in Second normal form (2NF). In the case of the first table name as 'Store', 'Location' (act as a non-prime attribute) fully dependent on 'StoreID' (act as a prime attribute). So, we can say that it is in 2NF. In the case of the second table name as 'Customer', there is no non-prime attribute, both are prime-attributes and dependent on each other.

Third Normal Form (3NF)

The table should be in 2NF.
Any non-prime attribute should not be determined by non-prime attribute.

Let's take an example to understand the points.

In this case, 'Roll#' is a candidate key & prime attribute and set as a primary key, on the other hand 'State' & 'City' are non-prime attributes. 'State' & 'City' are non-prime attribute because these attributes are not helpful in selecting a candidate key that is 'Roll#' in this example.

The problem in the table

Now come toward the point that is this table in 3NF?

In this case, Candidate Key = C.k = 'Roll#', Prime attribute = 'Roll#', Non-Prime attribute = 'State' & 'City'.

Functional dependencies: (Shown below)

This table is not in 3NF because 'state'(non-prime attribute) determines 'city'(non-prime attribute) & this is against the law of 3NF that is (Any non-prime attribute should not be determined by non-prime attribute). In the above fig. first functional dependency is a valid but second functional dependency is not valid according to Third normal form (3NF).

BCNF(Boyce Codd Normal Form)

The table should be in 3NF.
L.H.S of each Functional Dependency (FD) should be a candidate key or super key.

Let's take an example to understand the points.

Description of table

In this case Candidate key ={Roll#, Reg#}

Functional Dependency(FD): (Shown below)

So we can say that both the conditions of BCNF are fulfiled in the given example, i.e table is in 3NF & L.H.S of each functional dependencies are candidate keys as shown above.

Search This Blog

Sql development & DataBase Management