A normalized database stores each fact in exactly one place.
This makes for very robust write operations, it is much easier
to get things right on the way in. But it becomes much harder
to get things out efficiently or easily, so very often we
denormalize, that is, we store facts in more than one place for
easier retrieval. This requires a very well thought out
strategy to make sure these repeated values are always correct.
There are links to related essays on normalization and denormalization at the bottom of this post.
This blog has two tables of contents, the
Topical Table of Contents and the list
of
Database Skills.
Review of Methods
For our example this week we will consider a shopping cart. The
orders are kept in the ORDERS table and the items purchased are
in the ORDER_LINES table. We have denormalized the database by
keeping the value ORDER_TOTAL in the ORDERS table. Every time a line
item is changed, the ORDER_TOTAL must be updated.
There are a handful of methods that are popular in the wild
for doing this. Some practices emphasize
programmer discipline, others seek to prevent actions that
will lead to inccorrect values. Strategies also fall into
application level or database level, while still others operate
at the architecture level.
Personally I chose triggers about four years ago, which I will
explain at the end of this essay.
The Weakest Approach: Discipline
The simplest approach is to require that programmers be made aware
of all denormalized values and that they must remember if they
modify a line item to update the ORDER_TOTAL. This works well enough
actually for small programming teams, where
there is only one or maybe two programmers, preferably sitting right
next to other. Mind-reading helps here as well.
Of course this approach falls apart like a rotten burlap bag as soon
as the team or the program exceeds the ability of the team to keep
it all straight in their heads.
But I did not bring up this example just to ridicule it. I have found
that even seasoned veteran programmers (including your humble author)
will fall into the trap of trying to enforce conventions at many
levels in their programming. To see why this is always a bad idea
and should always be avoided, consider this hypothetical case.
Imagine a new programming language is introduced known as Super-G,
which is fashionable and wonderful and everybody loves it. It has
a well-thought out typing system except for one odd behavior:
If you code a line that concatenates a string with a date, program
execution ends with no error. The compiler does not trap for this
and run-time does not tell you why it quits.
You can Google for it and find
out that everybody knows about it, and you just have to remember
not to do that! The language's authors have no plans to fix it
because nothing is wrong. The fully expec you to always remember
never to concatenate strings and dates.
The example is meant to be absurd, but to reinforce that any
strategy where you just have to remember is out of the running
from the start. Since we would not accept this in any tool we use,
we should certainly never build our own practices upon such sand,
and certainly we would not count on it to keep denormalized values
correct.
Limiting Access To The Database
The next simplest strategy is to prune down what agents (programs
or users) can get at the database. The idea is simple: just let one
program get at the database, make sure that program is correct,
and force everybody to go through the application.
This will work if you can get your programs right and there is no
chance that any of the check-signers will demand access except through
your application. Many programmers believe this is true for them.
Some of them are right, but many are not: their users would love to
get access to the database but the programmer has created a situation
where it is impossible.
Personally I try to avoid this approach completely, and my reasons
are both philosophical and technical.
On the technical side, successful programs always expand in
scope, and the demand for flexible database access always
increases. Limiting access to the database means that eventually
you have to recode the entire database interface. This means work
for you, cost to the customer, and work for the customer in plugging
into whatever interface you create. This may be doable, but the
overriding fact is that databases already have an interface,
and any time spent re-inventing it could better be spent
on just about anything.
On the philosophical side I simply do not like any architecture
where limitations are built in from the start. Call it a personal
prejudice, but I much prefer to find the flexible solution where
there is one (and personally I love to find it where it appears
it does not exist). Overall the flexible solution
always leads to more possibilities for
work, more features, and just plain more fun.
Application Framework Strategies
If you are committed mainting the ORDER_TOTAL in application code,
and you wish to avoid the "please remember to always...." blunder,
then it must not be possible for new programmers or prima donna
programmers to violate the requirement. This means your framework
cannot allow random SQL commands, and must somehow force all write
access to particular tables to route through particular objects or
functions. A good ORM system should at very least not only provide
a mechanism for updating related tables, but also prevent any access
except through that mechanism.
Personally I have no use for these kinds of systems, for reasons
explained in the previous section, and so I cannot really comment
on them beyond describing these basic minimum requirements.
Server-Side Strategy: Stored Procedure
A few years ago I was working in Manhattan and a fellow programmer
explained that at his wife's job all database write access had to go
through stored procedures. The idea was to ensure that business rules
were always enforced and to prevent any programmer from wittingly
or unwittingly violating the rules. In the interest of full disclosure,
I'll point out that I have never worked on such a system, and all of
my knowledge of such systems is second-hand or third-hand.
With that being said, the obvious up-side to this method is that you
avoid forcing database access through your application, making things
much more robust and extendable. Further, you make sure, by coding
up routines that handle UPDATES and INSERTS to ORDERS and ORDER_LINES
that the useful but troubesome ORDER_TOTAL value is always updated
when it needs to be. Further still, you can tie security to the
stored procedures and control who can modify orders, which is a
prime feature
mentioned by everybody who has ever explained such a system to me.
There is a significant down-side if you intend to code the stored
procedures manually. My own experience is that server-side code is
the most difficult to debug (please feel free to post a comment
trumpeting your favorite debugger for stored procedures, I'm all
ears).
I have never been tempted to use a system like this because I
believe it is still exactly one level more complicated than it needs
to be. What I really want is to be able to directly code an INESRT
to the ORDER_LINES table from any source and know the ORDER_TOTAL field
will always be correct. If that were possible, then all parties are
liberated from inventing and then using any API except SQL. Now of
course many of us prefer to build some layer on top of SQL (myself
inclued), but if the architecture supports direct SQL while enforcing
business rules then all parties are free to use abstraction layers
of their choosing, and nobody is forced to invent or accomodate
anything they do not wish to.
Server-Side Stragey: Triggers
It is a simple technical fact that the tightest possible encapsulation
of code and data occurs when you attach triggers to tables. In our
example of the ORDER_TOTAL value, any INSERT, UPDATE, or DELETE to
the ORDER_LINES table would update the ORDER_TOTAL in the ORDERS
table. This approach gives maximum flexibility: you can directly
access the database without violating rules, and any player can use
an abstraction layer of their choice, or none at all.
Since many programmers find it very tedious and error-prone to code
and debug server-side routines, this approach still faces a large
obstacle if you intend to code the triggers by hand. But this should
not be necessary when taking this approach, because all denormalization
will follow patterns. This is a theme that I tend to repeat over and
over in these essays: your tables will all follow predictable patterns
and your denormalizations will likewise follow patterns. Whenever
you have patterns you can have automation, and in this case that means
generating the triggers instead of coding them by hand.
Another concern with this approach is security. I have been stressing
the inevitable need for expanded database access as your application
matures, but if you let somebody in with full priveleges, they could
accidentally or maliciously cause huge damage if they can run willy-nilly
wherever they want in the database. The trigger-based approach
is the tightest possible way to enforce business rules, but it does
nothing to address security. And if you end up granting database
access based on confidence in triggers, then you are forced into
enforcing security as well inside of the database -- but that is an
essay for another day.
Conclusion
Once we decide to denormalize then we are required to dream up a
strategy to keep things correct going in. The weakest strategies
depend upon voluntary adherence to some set of conventions, and many
strategies accept limitations in overall flexibilty to reduce the
threat from unknown elements. The trigger option, not very popular
these days, provides the tightest encapsulation of code and data,
and lends itself well to code generation.
Related Essays
This blog has two tables of contents, the
Topical Table of Contents and the list
of
Database Skills.
The normalization essays on this blog are:
- >Revisiting Normalization and Denormalization.
- Pay Me Now Or Pay Me Later
- The Argument for Normalization
- First Normal Form
- Second Normal Form
- Third Normal Form and Calculated Values
- The Argument for Denormalization
- Denormalization Patterns
- Keeping Denormalized Values Correct (this essay)
- Triggers, Encapsulation and Composition
- The Data Dictionary and Calculations, Part 1
- The Data Dictionary and Calculations, Part 2
Update Contact :
No Wa/Telepon (puat) : 085267792168
No Wa/Telepon (fajar) : 085369237896
Email : Fajarudinsidik@gmail.com
No Wa/Telepon (puat) : 085267792168
No Wa/Telepon (fajar) : 085369237896
Email: Fajarudinsidik@gmail.com
atau Kirimkan Private messanger melalui email dengan klik tombol order dibawah ini :