Denormalization Patterns

بِسْــــــــــــــــمِ اﷲِالرَّحْمَنِ اارَّحِيم

bismillaahirrahmaanirrahiim

السَّلاَمُ عَلَيْكُمْ وَرَحْمَةُ اللهِ وَبَرَكَاتُهُ

Assalamu'alaikum warahmatullahi wabarakatuh

Welcome to the Database Programmer! The main theme of these essays is that your applications will be leaner, faster, and easier to write and maintain if you understand how databases work.

This essay is part of the Keys, Normalization and Denormalization series.
There is also a Complete Table of Contents and a >Skills-oriented Table Of Contents.

The Non-normalized Database

A denormalized database is one that has been meticulously
normalized to eliminate redundancies, only to have
redundancies deliberately put back in to meet other
needs. That is the kind of database we are going to
talk about today.

By contrast, a database is "non-normalized" if nobody
ever bothered
to normalize it. This is what you get when the programmer
says, "I'm not concerned with the table structure yet,
I'm working on the code first." In my experience this
is something like saying, "I'm not worried about the plan
for this work bench I'm building, I'm just going to throw
the wood on the table saw and start cutting." We are not
interested in non-normalized databases.

The Fully Normalized Database

The fully normalized database will have few if any redundancies,
each fact will be stored in exactly one place. This database
has one big advantage, which is that write actions,
INSERT, UPDATE, and DELETE, will be very easy to code up
correctly. The application code for a fully normalized database
will be smooth, simple, easy to write and easy to maintain.

But decades of experience has shown that fully normalized
databases have a few drawbacks, which is why practical-minded
programmers always end up denormalizing their designs. The
problems in particular are:

No calculated values. If you have a
shopping cart with
the columns "price" and "qty", it may seem natural to put
in a column "extended_price" that holds price * qty. This
is actually forbidden by third normal form, though almost
everybody puts it in anyway.
Non-reproducible Calculations. Continuing from
the point above, most shopping carts or ordering systems have
many calculations that are far more complicated than simply
doing price * qty. Often a calculation may depend on a dozen
or more preceding calculations. In complex situations like this,
it is all too common that changes in the program mean an invoice
printed after an upgrade produces different numbers than
the same invoice printed before the upgrade.
You don't want that phone call!
JOIN Jungles. A fully normalized database will
have information scattered in many different tables. Though
this makes it easy to get it right going in, it can make it
terribly difficult to get seemingly simple combinations of
data back out.

The Denormalized Database

As I said above, the denormalized database is one that was
first normalized by having its redundancies removed, only
to have some redundancies deliberately put back in. This
can solve all three of the problems listed above. There is
a cost of course.
When we denormalize we have to keep two things in mind:

We must dream up a way to keep the redundant values
correct. If we can pull this off we get all of the
advantages of denormalization with no drawbacks.
Following up on the first point, we have a better chance
of getting it right if we can identify a set of
denormalization patterns. Once they are identified,
we can code up something in the framework that supports them,
and now we can see the gold at the end of the rainbow.

The Foreign Key Is Our Friend (Again)

I have said many times in these essays that the foreign key
is the only way to establish relationships between facts
stored in different tables. We will now see how this
relates to denormalizing our database.

Denormalization means introducing redundancies. In other words,
a fact that was stored in only one place is now stored in two
or more places. If we are going to copy a value from one table
to another table, it stands to reason that there must be some
logical relationship between those two tables. Since the only
kind of relationship we can have between two tables is a foreign
key, our denormalization patterns must in some way work with
foreign keys.

The First Pattern: FETCH

Consider a shopping cart that has a column "sku" and another
column "price." Most programmers lay out these tables and
write some code that copies the price from the ITEMS table
to the ORDER_LINES table. I call this pattern the "FETCH"
because the price is FETCHed from the ITEMS table and written
into the ORDER_LINES table.

Most programmers code up FETCH operations all over the place
and do not ever realize they are denormalizing. I think this
pattern is just so natural that most of us never think about
it. If you examine your database applications you will
likely see that you are doing this all over the place.

In order to get this pattern to operate correctly, your
framework must make sure at very least that the SKU is not
null when an INSERT is made to ORDER_LINES, and that the
price is copied during the INSERT. You
can maintain correctness by not allowing users to change the
SKU on this table, if they change their minds they must delete
a line and enter a new one. Or, you can make your framework
a little more flexible and execute the FETCH again if the
SKU changes on an UPDATE.

Sidebar: Is FETCH Really Denormalizing?

Die-hard relational theorists will tell you not to copy price
from the items table. You are supposed to leave it where it
belongs and use JOINs to pick up the price when it is needed.
There are three arguments against this sort of purity.

The first practical argument is that it is horribly difficult to
deal with complex calculations this way. It is far easier to
copy the price when the line goes in, so you never have to
"go looking" for it again.

The second practical argument is that performance tanks if you
follow the die-hard relational approach. If you
have to look in 6 tables every time somebody refreshes their
cart you will have a much slower program than one that only has
to look in one table.

But the third argument is more theoretical, and it is this:
the FETCH is not really denormalizing. The idea is that
when the customer makes an order your store has entered
in an agreement to sell something at a particular price. That
price is stored on the order and is now a fact about that
order. It is not redundant because you are not storing the
SKU's generic price, you are only storing the price that that
customer is going to pay on this order. If the price changes
5 minutes after the customer places the order, they will expect
to get the price as it was when they put it in the cart, and so
you are actually doing the right thing by writing it to the order.

The Second Pattern: Aggregations

The FETCH that was described above is all about copying
a value from a parent table to a child table. The opposite
pattern occurs when you roll up values from children to
parents. These are usually done as totals (SUMS), counts,
averages, minimums, and maximums.

Looking at the ORDER_LINES again, if a customer has 3 items in
their cart, it is perfectly natural to most programmers to
put a column "PRODUCT_TOTAL" onto their ORDERS table that holds
the sum of all of the lines. This is called an aggregation,
because the result in the parent table is always some operation
performed on the aggregation of all of the child rows.

Aggregrations are always denormalizing because they are
values that could be derived from other values. To be specific,
an aggregration violates third normal form because it
introduces a non-key dependency - a value that is dependent
not on the key but on values from a completely different table!

In order to make sure this value is always correct, the
framework must always update the total on the parent table
when any line in the child table changes. If your
framework can do that successfully, your aggregations will
always be correct.

The Third Pattern: EXTEND

The first two patterns we saw dealt with foreign keys.
The first pattern, the FETCH, involves values travelling
"downward" on a foreign key from parent to child. The
second pattern involves values travelling "upward" on a
foreign key from child to parent. The third and final
denormalizing pattern involves calculated values within
a row.

The example at the beginning of this essay was the column
EXTENDED_PRICE, which holds the value of PRICE * QTY.
This is an EXTEND operation, because it extends a row by
adding a new redundant value. This is denormalizing because
it violates third normal form, it introduces a value that
is not dependent on any candidate key.

If you want to makes sure your EXTENDs are always correct
then you need a framework that will always update the
calculation when either of its dependent values changes.

Dependency Tracking

In describing the three denormalizing patterns above, I have
explained what you need to make sure each one is performed
successfully. There is a final requirement to keeping all of
this correct, which is that the operations must be performed
in the proper order.

Considering the shopping cart again, in particular the
ORDER_LINES table, these three operations must occur in
this order:

The PRICE is FETCHed
The EXTENDED_PRICE is calculated as an EXTEND
The ORDERS table's PRODUCT_TOTAL value is adjusted.

Your framework must have a reasonable way to make sure that
the operations are performed in the correct order, or they
will not give the correct result. As a rule of thumb, in
most systems the FETCHes come first, followed by the EXTENDs,
and then the aggregations.

Meta-data can be a big help here. When I first contemplated
these patterns about four years ago, it occurred to me that
they could all be stored as formulas in the basic description
of the database, and that a
code generator would sequence them for me and generate the
code, so that the operations would always occur in the
correct order. I wrote the basic system in the fall of 2004
and have found it to work extremely well ever since. In my
personal opinion, this is the only way to reliably handle
these patterns.

Conclusion: Denormalization Also Follows Patterns

A fully normalized database makes it easy to get data in
correctly, but makes it difficult to get it out.
Denormalizing is the process of taking a normalized database
and deliberately introducing redundancies to improve
query writing, performance and correctness.
Not surprisingly, denormalization
has its own patterns. Two of these follow the foreign key,
and the third one works inside of a single row. If you follow
these patterns and fashion your framework to keep them correct,
you get all of the benefits of denormalization without the
concern for bad data.

Denormalization Patterns

The Non-normalized Database

The Fully Normalized Database

The Denormalized Database

The Foreign Key Is Our Friend (Again)

The First Pattern: FETCH

Sidebar: Is FETCH Really Denormalizing?

The Second Pattern: Aggregations

The Third Pattern: EXTEND

Dependency Tracking

Conclusion: Denormalization Also Follows Patterns

Other Posts

Terimakasih Telah mengunjungi website Kami..!!

Tawk.to

Pencarian

Blog

Populer

Populer

My Ads

Artikel

Artikel