A database application is a like a convoy of ships, it is only as
fast as the the slowest ship. The three "ships" in a web-based database
application are the database itself, the web layer, and the browser.
Today we will continue our series on performance by examining how
the web layer can efficiently retrieve data from the database.
Welcome to the Database Programmer blog. This blog is for anybody who wants to see practical examples of how databases work and how to create lean and efficient database applications. There is a
"http://database-programmer.blogspot.com/2007/12/database-skills-complete-contents.html"
>Complete Table Of Contents that is updated each week, and a
"http://database-programmer.blogspot.com/2008/01/table-design-patterns.html"
>Master list of table design patterns
that is updated whenever a new design pattern is presented.
Cost 1: Round Trips
The first basic cost of retrieving data is the "round trip".
Database programmers speak of a "round trip" as occurring whenever you
send a request the server and retrieve some results. Each round trip
to the server carries some overhead, as the server must do some basic
work to allocate and release resources at the start and end of the
request. This overhead is added to the base cost the server must pay
to actually go out to disk to find and retrieve your data.
If your application makes more round trips than are necessary, then
the program will be slower than it could be.
Cost 2: Retrieval Size
Every byte that the application retrieves from the server carries a cost
at several points. The server must go to disk and read it, the wire
must carry the load from the db server to the web server, and the web server
must hold the result in memory. If your web code regularly retrieves
more information than it needs, then the program will be slower
than it could be.
This is why you will see advice that you should never
use "SELECT *..." in direct queries, because it is near certain
that you are retrieving data you will not use. The more desirable
query names the exact columns you need so that you have maximum
efficiency. This is especially important if your table
contains text (aka clob) fields, if you use "SELECT *..." on one of
those tables you risk pulling all kinds of
data over the wire that is just going to be thrown away.
Example 1: One Round Trip By Using JOIN
Consider the basic case where you are retrieving and displaying the
line items from an order (or shopping cart as people call it these
days). Let us assume that you have an ORDER_LINES table that contains
SKU, QTY, and PRICE among others. The item's description is in the
ITEMS table. To display the lines, you must retrieve each line and
also retrieve the item's description.
To do this most efficiently, we can make a single round trip to the
server that retrieves all of the columns we need in one shot, then
do a loop to render them like so (the example is in PHP):
# Assume some function that gives you the order number,
# sanitized for safe substition
$order = GetOrderNumber();
# Form the SQL
$sq="SELECT ol.sku,ol.price,ol.qty,ol.extended_price
,i.description
FROM ORDER_LINES ol
JOIN ITEMS i ON ol.sku = i.sku
WHERE ol.oder = $order";
# Most frameworks should have some command to retrieve
# all rows for a query, something like this:
$lines = SQL_AllRows($sq);
# Finally, render the HTML
foreach($lines as $line) {
#
# HTML rendering code here
#
}
I should stress that this example carries a reasonable expectation
that the order is small enough that you don't start hitting the inefficiencies of your particular language. Rendering
large results sets in a Web Application is severely problematic compared
to the old desktop systems, and doing so requires separate techniques
that will have to wait for a future essay.
Example 2: Caching Small Tables
Sometimes you will need to generate a printed report that involves
many tables, including several description lookups. For instance,
I have a medical
application that generates statements for all patients who have
a balance. A typical run will produce 100 or 200 statements, and
each statement requires information from no less than 8 tables.
In cases like this you can simplify your queries by retrieving the
small lookup tables in their entirety before going to the
main query and loop. For the example of the medical program there are
two tables that qualify for this treatment. These are the tables of
"ICD9" codes and "CPT" codes. Both of these usually have only about
100 rows, and there are only 2 relevant columns in one and 3 in the
other. Therefore there is a big gain to be had by simply loading them
into RAM ahead of time and simplifying the resulting code.
This bare-bones example shows simply that the tables are loaded
first, and then main execution begins.
# The function SQL_Allrows() gives me the complete result from
# a query, the 2nd argument says to return an associative
# array with key values made out of the named column.
# NOTE: an "icd9" code is a medical diagnosis code
$icd9codes = SQL_AllRows(
"Select icd9code,description from icd9codes"
,"icd9code"
);
# NOTE: a "CPT" code is a medical procedure code
$cptcodes = SQL_AllRows(
"select cptcode,description from cptcodes"
,"cptcodes"
);
# ...now continue by pre-fetching the list of patients
# we will be dealing with, and then we can finally
# go into the main loop and refer to the $icd9codes
# and $cptcodes array as needed.
#
$patients = SQL_AllRows(
"Select patient from patients where balance > 0
order by last_name,first_name"
);
foreach($patients as $patient) {
#
# retrieve the statement information, use
# arrays $cptcodes and $icd9codes to display
# descriptions for those codes
#
}
Knowing Your Context
There is one more piece of the puzzle that a programmer must have if he is
to make wise decisions when trying to balance round trips and retrieval
size. This is a thorough knowledge of your context. Knowing your
context can dramatically help in making decisions.
Some examples of context are:
- Huge social networking site or portal with hundreds of hits per
second.
- eCommerce site.
- Line of business program used by the staff of a company to do their
daily work.
My own context is the third item, line of business applications. In this
context the following realities hold:
- A huge user base might be a few hundred, with never more than
five or six simultaneous transactions going on.
- A much more common user base is 10-20 users (or even 3 or 4!),
with one transaction every 5-20 seconds.
- The public website accessed by customers is limited to a few thousand
potential users, of which you rarely if ever have two or more users on
at the same time.
In this context I have a wealth of server resources, because my customer
can spend as little as $1500.00 and get a server with more RAM than 10-20
users will ever use at the same time. Therefore, my own coding habits
often tend toward caching lookup tables and pulling 300 rows into memory
at one shot so that I can get them to the screen (or PDF, or CSV...) as
fast as possible. But these decisions are guided by the context of
my applications, if your context is different, you may be led to
different conclusions.
Conclusion
It is not difficult to create database applications that perform
well. The basic rules of thumb are to make a minimum number of round
trips to the server and to retrieve precisely the values that you
need and no more. These ideas work well because they minimize
your most expensive operation, which is disk access.
It is also perfectly acceptable to denormalize
your tables (following "http://database-programmer.blogspot.com/2008/04/denormalization-patterns.html"
>Denormalization Patterns) which simplifies your queries and
reduces JOIN operations.
Finally, you must know your context well, so that
you can evaluate techniques such as caching lookup tables.
These ideas form the cornerstone of most performance optimization
and you will find that applying them over and over rigorously will
give you most of what you need to keep performance strong in the
web layer.
"http://database-programmer.blogspot.com/2008/07/database-performance-pay-me-now-or-pay.html"
>Next Post: Pay Me Now or Pay Me Later
Update Contact :
No Wa/Telepon (puat) : 085267792168
No Wa/Telepon (fajar) : 085369237896
Email : Fajarudinsidik@gmail.com
No Wa/Telepon (puat) : 085267792168
No Wa/Telepon (fajar) : 085369237896
Email: Fajarudinsidik@gmail.com
atau Kirimkan Private messanger melalui email dengan klik tombol order dibawah ini :