Since all
non-trivial websites
require a database, and since database skills are
different from coding skills, there is very good
reason for any programmer to master the principles
of database design and use.
The most basic skill that you need for successful
database projects is table design. A strong
table design can be the foundation of a wonderful
application, while a weak design brings nothing but
delays, expense, headaches and heartaches.
One of the cornerstones of good table design is
the process of normalization.
Normalization proceeds through
different stages. The result of each stage is called
a "Normal form." So if we have completed the first
stage we say the database is in "First Normal Form",
and after we have completed the second stage we
say the database is in "Second Normal Form" and
so on.
This essay is part of the Keys, Normalization and Denormalization series.
There is also a Complete Table of Contents and a >Skills-oriented Table Of Contents.
The Example
Our example is a school that has different kinds of
schedules on different days. A normal day has 4 periods
in the morning, lunch, and then 3 afternoon periods.
Sometimes the periods are 35 minutes and there
is an assembly at the end of the day that goes to the
normal end of day. On short assembly days the periods
are 40 minutes each. There is a also a variation where
the short assembly might be after lunch instead of at
the end of the day.
A First Draft Table
Taking a stab at some table design, we may say
something like, "Well I can see that there are never
more than nine periods, and there are four kinds of
schedules. I can make a table that lists the four
kinds of schedules and tells me what happens in
each period, something like this:"
DAY_TYPE | MINUTES | P0 P1 P2 P3 P4 P5 P6 P7 P8
-----------+---------+----+----+----+----+----+----+----+----+----
NORMAL | 45 | C | C | C | C | L | C | C | C | X
ALONG | 35 | C | C | C | C | L | C | C | C | A
ASHORT | 40 | C | C | C | C | L | C | C | C | A
ALUNCH | 40 | C | C | C | C | L | A | C | C | C
....where "C" means a class period, "L" means lunch,
"A" means assembly and "X" means nothing happens.
It so happens that this table is not in first normal form.
We will look at what this means
first in theory, and then we will see the practical problems
that come from it.
First Normal Form as Theory
A table is in first normal form if it has
no repeating groups.
The example above repeats the period column
nine times with columns P0, P1, P2.
This is a repeating group, and so it
violates first normal form. Theory tells us that
we must redesign the table, splitting the repeating
group out into a child table. This will leave
us with two tables, one that is for
day types and the other that tells us about periods.
The table of day types is now quite simpler:
DAY_TYPE | MINUTES | DESCRIPTION
-----------+---------+-----------------------------
NORMAL | 45 | Normal Day
ALONG | 35 | Long Assembly
ASHORT | 40 | Short Assembly
ALUNCH | 40 | Short Assembly After Lunch
The original table had ordering information
buried in the names of the columns. We need to
have that same information in our child table
so we will have the PERIOD column be numeric and
it will determine the order of the periods.
Here is the table with some of
its entries:
DAY_TYPE | PERIOD | ACTIVITY
-----------+-----------+---------------
NORMAL | 1 | Class
NORMAL | 2 | Class
NORMAL | 3 | Class
NORMAL | 4 | Class
NORMAL | 5 | Lunch
NORMAL | 6 | Class
NORMAL | 7 | Class
NORMAL | 8 | Class
ALONG | 1 | Class
ALONG | 2 | Class
ALONG | 3 | Class
ALONG | 4 | Class
ALONG | 5 | Lunch
ALONG | 6 | Class
ALONG | 7 | Class
ALONG | 8 | Class
ALONG | 9 | Assembly
We have now satisfied the theory, and put
our fledgling database into First Normal Form.
But the question arises, wasn't the single-table
version a lot simpler? This new "fixed" version seems more
complicated. A programmer might say something
like, "I have to look all over the place to find
what I need." To answer these objections we will
now examine the practical side of the example.
Problem 1: Structure Changes
The non-normalized example is probably fine if you have
just one school that never changes their schedule. But if
you have multiple schools you will need to change your
structure for them, adding or removing periods.
To this we might reply, "but sure, structures change
all of the time." Yes, it is true that a structure change
may be required
to improve a program or add functionality, but in the
example above you must change the structure
in cases where you are not improving the program or
adding functionality. This means the more people use
your program, the more work for you! By contrast,
in the normalized situation a different schedule just
means an end-user task to make entries in the tables.
You can argue that this particular
example is not so bad, because, after all, how many
periods can any school really have? Let's make a table
with 15 periods and be done with it, no? The problem is that
if you win that argument and make a habit out of
repeating groups then you saddle your development efforts with
the ongoing need to do structure changes where simple
data editing would have done the job.
Problem 2: Secondary Columns
I worked on a system one time that had a repeating group
of 24 columns in several tables. The columns were for the size of an
item of clothing. In one table there were 24 additional
columns for price. Then 24 more for cost. Then 24 more
for gross profit. There were more, should I continue?
The problem would appear in our school schedule if we
wanted to put the start and end times of the periods into
the tables so we did not have to figure them out on
every page load. We would now have START0, START1, END0, END1, and
so on, eighteen more columns. Such fat tables like this are
very clumsy to work with both in code and when doing ad-hoc
queries.
Problem 3: Constants Required
Imagine you have a simple routine called SQL_ALLROWS()
that gives you query results in an array (if you don' have
one, code it today!). If your database is well normalized
your code will display this pattern a lot:
$rows = SQL_AllRows(
"SELECT activity FROM period_info
WHERE day_type = 'NORMAL'
ORDER BY sequence"
);
foreach($rows as $row) {
// ...do something interesting...
}
This simple code is not possible when you violate first
normal form. You must put a constant into your
code somewhere so that your code knows how many columns
there are. This is an extra bit of maitenance to worry
about here:
$row = SQL_OneRow(
"SELECT * FROM period_info
WHERE day_type = 'NORMAL'
);
for($x = 0; $x < NUM_PERIODS; $x++ ) {
$period = $row['P'.$x];
// ... do some display...
}
Problem 4: Query Construction
For problem 3 above, I put a "Select *" into the
query. I really should not
have done that. I should have listed all 9 "P" columnns
(or all 27 columns if we put start and end times
in also). Something like this:
SELECT P0,P1,P2,....
But this violates good programming practice. If the structure
changes then I have to go to all of those queries and
modify them. A better programming practice would be
to use that NUM_PERIODs constant to generate column lists
on the fly:
$cols = array();
for ($x = 0; $x < NUM_PERIODS; $x++ ) {
$cols[] = 'P'.$x;
}
$query = "SELECT ".implode(',',$cols)." from...";
Most programmers will avoid a bit of code like this because
it seems at first glance to be really stretching a point.
Of course once the program is successful and the additions
and changes come in that
leaves you going through and changing every query
every time you modify the structure. None of this is
necessary with the normalized version.
It is starting to seem like a stretch to claim that
we have take the simpler path here. Perhaps the
"complicated" normalized tables really are simpler.
Problem 5: Hard-Coded Tools
As I write this in December of 2007 the web is
rediscovering a UI widget that has been in use for
decades in terminals and desktops: the grid. The grid
is the natural method of editing child tables in
admin interfaces.
Before 1995, when there were no web programmers,
a programmer who violated first normal form paid the
price by needing to code up things like grids by
hand because he could not use grid-like tools available
in his framework. The modern web programmer will
pay the price in not being able to use the grids
that are rapidly coming online.
Problem 6: Cannot Aggregate
Finally we will consider a simple question, tell me
how many periods there are in any particular DAY_TYPE.
In a normalized database the SQL is easy:
SELECT COUNT(*) FROM periods WHERE day_type='NORMAL'
-- or, to get them all:
SELECT COUNT(*),Day_type
FROM periods
GROUP BY day_type
Getting the same answer in the non-normalized database
requires code. It does not matter how simple that
code is, the fact is that if the tables were
normalized the code would not be required at all. Multiply
that by 10 tables, 100 tables, and you just have
complication after complication.
Conclusion
Violating first normal form puts us in a situation where we
need programmers to make changes, while normalized databases
can be modified by end users. Violating first normal
form also forces us to embed and maintaint constants we would
not otherwise need and to change queries that could
otherwise be left alone. These problems can be mitigated with
libraries of code that build queries for us, but still those
libraries have to be written.
In this light, we see can see first normal form as something that
creates efficient code and easier-to-maintain
databases.
Other Posts
This essay is part of the Keys, Normalization and Denormalization series.
There is also a Complete Table of Contents and a >Skills-oriented Table Of Contents.
Update Contact :
No Wa/Telepon (puat) : 085267792168
No Wa/Telepon (fajar) : 085369237896
Email : Fajarudinsidik@gmail.com
No Wa/Telepon (puat) : 085267792168
No Wa/Telepon (fajar) : 085369237896
Email: Fajarudinsidik@gmail.com
atau Kirimkan Private messanger melalui email dengan klik tombol order dibawah ini :