Normalisation
Normalising data: Normalisation is a technique used to reduce the duplication of data in a relational database.
It helps to:
» organise data in an efficient way
» remove redundant (duplicated) data
» make sure that only related data is stored in a table.
This technique is a multi-step process, where each step has a rule that improves the efficiency of the database. These rules are called the Normal Forms, numbered from 0 to 5.
Un-normalised Form (0NF or UNF): If a database is not normalised it is called an un-normalised database, often shortened to 0NF or sometimes UNF.
This is often a flat-file database (a single table) that contains duplicated data (which is called redundant data) and complex data structures (more than one item of data, otherwise known as non atomic data) stored within a single field.
First Normal Form (1NF)
The rules for a database normalised to 1NF are:
» All data is stored in a database table.
» A unique key must exist in each table (often a primary or compound key).
» Only atomic data is stored (which is data stored to the lowest level of data and cannot be broken down any further).
» Each field has a unique name. » Each record is unique (so there are no repeated rows in any table and a primary/compound key exists in each table).
» There are no repeating groups of columns. To convert the above data into 1NF we must first ensure that a unique key exists in each table. There are two students with the same name. Looking at this data it is possible (although unlikely) that two students with identical names, houses and courses could exist, so a new field is required as a primary key field. We will call this field St_ID. Each St_ID will be a unique number.
The data given contains non-atomic data. The student’s name can be split into their Forename and their Surname. The course details can be split into the Subject, Teacher and Room. Each course should also have its own unique ID (the data presented is not unique, for example Mr Brown may take two different Computing classes in the same classroom at different times). We will call this field Course_ID. For the key to be unique we have a compound key with both St_ID and Course_ID. This data would therefore satisfy First Normal Form as all data is in a table, there are no repeating groups of columns, each record is unique, each field has a unique name, and all data is atomic. There are two key fields, one for the students and one for the courses.
Why is this data in First Normal Form (1NF) and not in Second Normal Form (2NF)?
This data is not in 2NF because the:
» Subject, Teacher or Room fields are not dependent on the St_ID field
» Forename, Surname or House fields are not dependent on the Course_ID field.
Second Normal Form (2NF) The rule for a database normalised to 2NF is that the table must be in 1NF and any non-key attributes that only depend on part of the table key are placed in a new table. Looking at this data, it is grouped into two types of data, the data relating to each course and the data relating to each student. The data must be split into three tables: courses, students and a table to link the two to meet 2NF. The first table will relate to the courses.
You will notice that we have sorted the data to help ensure there are no duplicated records. The second table will consist of the students.
The third table will link each student to each course.
Why is this data in Second Normal Form (2NF) and not Third Normal Form (3NF)?
This data is not in 3NF, because the Teacher and Room fields are dependent on each other, as each teacher always teaches in the same room.
Third Normal Form (3NF) The rule for a database normalised to 3NF is to make sure that any non-key attributes that are more dependent on other non-key attributes than the table key, are removed to a new table. Looking at this data, the course table must be split into two tables:
one for the courses and one for the teachers to meet 3NF. The first table will relate to the courses.
Sometimes when you change the data into one type of form, it is also set at the next level. For example, if data was changed into 2NF, it may also set as 3NF without any other changes required, if it also meets the rules for 3NF.
Advantages and disadvantages of normalising data
Less money needs to be spent on storage as the file size is smaller due to there being no redundant data, but the resulting larger numbers of tables require more relationships to be designed. This takes more time, when designing a large database, as well as requiring workers to have greater knowledge. This, in turn, may lead to workers having to be brought in from other companies and adds to the cost.
It groups data logically as well as reducing inconsistent data in tables by enforcing referential integrity, but having more tables than an unnormalised database makes it difficult to monitor where particular data is.
With the larger number of tables, setting up complex queries can be more difficult, although searching on one table will be much faster as there is less data to go through. The processing of data can be slower with a greater number of tables and links to navigate.
Any changes which are needed in certain records can be made automatically to any related records. With no duplicated data there will be fewer errors in the data and making changes to a table is easier as there is less data to alter.
Removing non-atomic data may not always be a good idea if putting data into separate fields serves no useful purpose.





















