Many database performance decisions come down to "pay me now or
pay me later." Some decisions will produce faster inserts and
updates at the cost of slower and more complex reads,
while other decisions will
slow down inserts and updates but provide faster and easier
reads.
There are links to related essays on normalization and denormalization at the bottom of this post.
This blog has two tables of contents, the
Topical Table of Contents and the list
of
Database Skills.
Paying With Indexes
An index speeds up SELECT operations. We are not going to go
into any detail about how indexes work, this week we will
stick to how indexes affect performance.
Imagine a table of 10,000 sales orders. You wish to pull out
a handful of fields for orders placed on 5/1/08,
so you issue this SELECT:
SELECT customer,order_total
FROM orders
WHERE date = '2008-05-01'
If you do not have an index on the table, then the database
server will have to scan every single row in the table
to find the rows that match the WHERE clause. On the other
hand, if you had an index on the date column, the server would
first read the index to find pointers to the correct rows,
and then read only the rows you needed. The index itself
is optimized by various methods so that only a very few reads are
necessary to find the correct values. Most databases support
the following syntax:
CREATE INDEX some_unique_name ON orders (date)
As far as performance goes, an index will slow down write
operations (INSERT, UPDATE, and DELETE) because the index
must be updated when the write operation occurs. This cost
is on top of the write operation itself.
(Addendum added July 7: Jochen
points out
correctly in his comment that this statement oversimplifies
things. While it is true that the index must be updated
for writes, the index can also dramatically speed up UPDATE
and DELETE operations if those operations use a WHERE
clause that can benefit from the index.)
In terms of "pay me now or pay me later", when you regularly
add a lot of indexes you are opting to "pay me now." You pay
the price of slower writes to get faster reads. If you regularly
avoid adding any indexes you are opting to "pay me later."
You defer the costs of access to read time to get faster writes.
I should note that it is not possible to completely avoid
indexes, nor is there any value in trying to. For instance,
a primary key requires an index because otherwise you have
to scan the entire table every time you do an INSERT, which
is just plain crazy. Foreign keys benefit from indexes as
well for similar reasons.
Paying With Views
A "view" is a stored SQL statement that you can SELECT from
as if were a table. Imagine we have a table of
TEACHERS and a table of COURSES that they are teaching in
a particular year. We often need to display a list of
courses with the names of the teachers. We can do this with
a JOIN, but a view gives us an easier pre-defined way to
do this:
CREATE VIEW courses_teachers AS
SELECT courses.room,courses.period,courses.teacher
,courses.year
,teacher.first_name,teacher.last_name
FROM courses
JOIN teachers ON courses.teacher = teachers.teacher
...which now lets you do the easier SELECT:
SELECT * FROM courses_teachers WHERE year='2008';
In terms of "pay me now or pay me later" a view is always a "pay me
later" decision. It makes for easier coding but the server must go
out on every SELECT and gather together the data required.
The "pay me later" nature of a VIEW meets its greatest extreme
when the view contains aggregations. Consider the following view
which gives you easy access to customers and their lifetime history
of orders and payments:
CREATE VIEW customers_extended AS
SELECT customers.*
,SUM(orders.order_total) as orders_total
,SUM(invoices.balance) as balance
FROM customers
JOIN orders ON customers.customer = orders.customer
JOIN invoices ON customers.customer = invoices.customer
-- Pulling from the VIEW requires a complete read
-- of relevant ORDERS and INVOICES tables
SELECT * FROM customers_extended
WHERE customer = X;
This view is a "pay me later" proposition because every time you
issue a SELECT from the view, it will have to scan many rows from
the ORDERS and INVOICES tables. The contrasting method is to
denormalize which is a "pay me now" approach.
Paying With Denormalization
Denormalizing means taking a normalized database and deliberately
inserting redundant values. I have an essay on the three
"http://database-programmer.blogspot.com/2008/04/denormalization-patterns.html"
>Denormalization Patterns that I use myself, which follow
these three forms:
- FETCH operations, where a value such as an item's price is
copied from the ITEMS table into the ORDER_LINES table.
- EXTEND operations, where you take the QTY and the PRICE
columns in the ORDER_LINES table and write the EXTENDED_PRICE.
- AGGREGATE operations, such as writing the total of ORDER_LINES
onto the ORDERS table.
All of these operations fall into the "pay me now" category.
When these denormalized columns are put into tables, they
add to the the size of the table and increase the cost of
write operations. However, when it comes time to SELECT out
of the tables the values are all there ready to go, usually
with fewer JOINs and lower overall disk activity.
Extreme Pay Me Now
In my line of work I deal with line-of-business programs that
are commissioned or purchased by businesses to do their daily
work. User counts are low and resources are high, because
often I will have 10 users on a single server, with access
via internet limited to only a few thousand potential customers
of which very few are ever on at the same time.
In this context, I prefer to take the "pay me now" approach
to its fullest realization. This means I tend to design my
systems so that:
- Any column a user is likely to filter on has an index.
- Tables are fully denormalized, containing a wealth
of derived values.
This means that all write operations on my systems are slower
than they might otherwise be. However, this is more than
acceptable within this context because the server is
largely untaxed, and users do not notice the difference between
100ms and 200ms to save a row. So I can pay when the user does
not notice and as reward I have very rich reporting and
lookup abilities.
The extreme pay-me-now approach has one more advantage.
The wealth of derived values in the database lets end-users
find what they are looking for without calling a programmer
and asking for a special page or report. Generally the more
derived values there are the truer this becomes.
Extreme Pay Me Later
The extreme form of pay-me-later is a fully normalized database
with no derived values and a minimum of indexes. Calculated
values are available either in views, client-side code or both.
This type of database is tuned for lots of fast writes because
the cost of an INSERT or UPDATE has been kept to an absolute
minimum. The database will be slower to perform ad-hoc or
one-off queries because the server will have to do table scans
whenever a user filters on anything except primary keys and
foreign keys.
The lack of derived values in fully normalized databases also
leads to more phone calls and emails asking the programmer
to create a report or page that will work out derived values
that are not present in the database.
Conclusion: Know Your Context
This week we have taken common database technologies such as
indexes and views and seen how they affect performance.
All of these technologies can be judged
in terms of the "pay me now or pay me later"
decision.
Database programmers normally choose to "pay me later"
when they must support a large number of simultaneous
write operations with a minimum of contention. These
situations call for fewer indexes and strict normalization.
The trade-off is that ad-hoc or one-off queries will
involve more JOINs, more table scans and an increased
likelihood the programmer will be called in for
special cases.
When read operations are more common than writes, or
when inquiries and reports are likely to be unpredictable,
database programmers will choose to "pay me now" by
doing more work on the write operation. There will
be more indexes and more denormalized values, so that
the user is more likely to quickly locate whatever
they want without programmer intervention.
Related Essays
This blog has two tables of contents, the
Topical Table of Contents and the list
of
Database Skills.
The normalization essays on this blog are:
- >Revisiting Normalization and Denormalization (this essay).
- Pay Me Now Or Pay Me Later
- The Argument for Normalization
- First Normal Form
- Second Normal Form
- Third Normal Form and Calculated Values
- The Argument for Denormalization
- Denormalization Patterns
- Keeping Denormalized Values Correct
- Triggers, Encapsulation and Composition
- The Data Dictionary and Calculations, Part 1
- The Data Dictionary and Calculations, Part 2
Update Contact :
No Wa/Telepon (puat) : 085267792168
No Wa/Telepon (fajar) : 085369237896
Email : Fajarudinsidik@gmail.com
No Wa/Telepon (puat) : 085267792168
No Wa/Telepon (fajar) : 085369237896
Email: Fajarudinsidik@gmail.com
atau Kirimkan Private messanger melalui email dengan klik tombol order dibawah ini :