The number one search term that brings people to this blog is
"data dictionary." So this week I will begin a
series on how to use the data dictionary to improve your own
productivity and reduce errors.
Building And Upgrading
This week we are going to see how to use a data dictionary to
eliminate upgrade scripts (mostly) and make for more efficient upgrades.
The approach described here also works for installing a system
from scratch, so an install and an upgrade become the same
process.
The major problems with upgrade scripts are these:
- They are the least-tested code in any system, and are
the most likely to break. If a script breaks and anybody
but the original programmer is running the upgrade, this
leads to aborted upgrades and upset customers.
- They are horribly inefficient when a customer upgrades
after a long time: the same table may be rebuilt many times
as script after script adds a column or two.
By contrast, a dictionary-based upgrade can take any customer
from any version of your software and in the fewest steps
possible bring them completely current, with no possibility
of broken scripts.
But first, a quick review of an important idea...
Review: The Text File
Back in June of 2008 this blog featured an "http://database-programmer.blogspot.com/2008/06/using-data-dictionary.html"
>overview of the data dictionary. There were many
ideas in that essay, but I wish to focus on one in particular,
the question of where to put the data dicationary and in
what format.
In terms of where to put it, the data dictionary should
be just another application file, in source control, and
delivered with your code. When the dictionary is in a
plaintext file (or files) and treated like other application
code, you do not have to invent any new
methods for handling it, just use the same methods you use for
the rest of your code.
In simple practical terms, it is best if a data dictionary can
be easily read and written by both people and computers.
This leads to a plaintext file or files in some format such
as JSON or YAML. I personally prefer YAML because it is a superset
of JSON, so using YAML gives me the option to sneak in JSON syntax
later if I choose, but starting with JSON does not let me go the
other way.
Requiring easy handling by people tends to rule out XML, which is
bletcherous to work with manually (in my humble opinion). Requiring
readability by the computer rules out UML unless your UML drawing
tool can produce a usable data file (comments always welcome, tell us
your favorite tool for doing this!). When considering UML, it is
the class diagrams that are most likely to be translatable into a
data dictionary.
Finally, encoding dictionary information in program
class files technically
meets the practical requirements listed above, but it has the
disadvantage of trapping data in code, which unnecessarily
couples your dictionary to whatever language you are using at the
moment. It is much better if the dictionary sits outside of the
code as pure data. Not to mention that spreading the dictionary out
in a collection of one-class-per-table files makes it much harder
to do upgrades in the way I am about to describe.
Review of Steps
When using a dictionary-based approach, you write some type of
"builder" program that reads your dictionary file, examines the
current structure of the database, and then generates SQL commands
to alter and create tables to make them all current.
There are plenty of ways to do this. My own approach is to
load the dictionary itself into tables, pull the current state into
similar tables, and then do queries to find new and altered
tables and columns. If you want to see a full-blown program of
this type, check out "https://andro.svn.sourceforge.net/svnroot/andro/trunk/andro/application/androBuild.php"
>androBuild.php, the Andromeda implementation of this idea.
The routines that apply to today's topic include "RealityGet()",
"Differences()", "Analyze()" and "PlanMake()".
Step 1: Load the Dictionary to RAM
To use the approach in this essay, you begin by parsing your plaintext
file and loading it to tables. Here is a simple example of a what
a dictionary file might look like in YAML format:
table states:
description: US States
columns:
state:
type: char
colprec: 2
caption: State Code
primary_key: "Y"
description:
type: varchar
colprec: 25
caption: State Name
If you are using PHP, you can parse this file using the
spyc program, which
converts the file into associative arrays. All or nearly all
modern languages have a YAML parser, check out the
YAML site to find yours.
Step 2: Load the Dictionary To Tables
The database you are building should have some tables that
you can use as a scratch area during the upgrade. You may
say, "The builder gives me tables, but I need tables to run
the builder, how can I do this?" The simplest way is to hardcode
the creation of these tables. A more mature solution would use
a separate dictionary file that just defines the dictionary tables.
The structure of the tables should match the data file, of course.
Here is what the YAML above would like like after being loaded
to tables:
TABLE | DESCRIPTION
--------+--------------------------------
states | US States
TABLE |COLUMN |CAPTION |TYPE |PRECISION
--------+------------+-----------+--------+-----------
states |state |State Code |char |2
states |description |State Name |varchar |25
Step 3: Fetch The Current State
All modern databases support the "information_schema" database
schema, a schema inside of each database that contains
tables that describe the structure of the database. While you
can make queries directly against the information_schema tables,
I prefer to fetch the information out of them into my own
tables so that all column names are consistent with my own.
A simple query to do this might look like this:
-- Postgres-specific example of pulling info out of the
-- information_schema table:
insert into TABLES_NOW (table_id) -- my dictionary table
SELECT table_name as table_id
FROM information_schema.tables
WHERE table_schema = 'public'
AND table_type = 'BASE TABLE'
Pulling column information out can be much more complicated, owing
to differences in how vendors implemement information_schema, and
owing to the complex way data is stored in it. Here is my own code
to pull out the definitions of columns from the Postgres
information_schema, which also simplifies the definition
dramatically, to make my downstream coding easier:
insert into zdd.tabflat_r
(table_id,column_id,formshort,colprec,colscale)
SELECT c.table_name,c.column_name,
CASE WHEN POSITION('timestamp' IN data_type) > 0 THEN 'timestamp'
WHEN POSITION('character varying' IN data_type) > 0 THEN 'varchar'
WHEN POSITION('character' IN data_type) > 0 THEN 'char'
WHEN POSITION('integer' IN data_type) > 0 THEN 'int'
ELSE data_type END,
CASE WHEN POSITION('character' IN data_type) > 0 THEN character_maximum_length
WHEN POSITION('numeric' IN data_type) > 0 THEN numeric_precision
ELSE 0 END,
CASE WHEN POSITION('numeric' IN data_type) > 0 THEN numeric_scale
ELSE 0 END
FROM information_schema.columns c
JOIN information_schema.tables t ON t.table_name = c.table_name
WHERE t.table_schema = 'public'
AND t.table_type = 'BASE TABLE'");
Step 4: The Magic Diff
Now we can see how the magic happens. Imagine you have 20 tables
in your application, and in the past week you have modified 5 of them
and added two more. You want to upgrade your demo site, so what is
the next step for the builder?
The builder must now do a "diff" between your dictionary and the
actual state of the database, looking for:
- Completely new tables.
- New columns in existing tables.
Lets say you have two tables, "TABLES_SPEC" which lists the
tables in your application. Then you have "TABLES_NOW" that lists
the tables in your database. The following query will give
you a list of new tables:
SELECT spec.table_id
FROM TABLES_SPEC spec
WHERE NOT EXISTS (
SELECT table_id from TABLES_NOW now
WHERE now.table_id = spec.table_id
)
It is now a simple thing pull the column definitions for each
table and generate some DDL to create the tables.
But we also have tables that have new columns. We can pull those
out like so:
SELECT * from COLUMNS_SPEC spec
-- the first where clause gets new columns
WHERE not exists (
SELECT table_id FROM COLUMNS_NOW now
WHERE spec.table_id = now.table_id
AND spec.column_id= now.column_id
)
-- this second subquery makes sure we are
-- getting only existing tables
AND EXISTS (
SELECT table_id from TABLES_NOW now
WHERE now.table_id = spec.table_id
)
Now again it is a simple matter to generate DDL commands that
add all of the new columns into each table. Some databases
will allow multiple columns to be added in one statement, while
others will require one ALTER TABLE per new column (really horrible
when you have to do that).
Please note this is sample code only, just to give you ideas,
and it will not cover every case.
Sidebar: Avoid Destructive Actions
Do not rush into writing code that drops columns or tables that
are not in your spec. The results of a misstep can be disastrous
(as in lose your job or your customer). My own builder code is now
4 1/2 years old and I have never yet bothered to write a
"destructive" upgrade that will clean out unused tables and columns.
Maybe someday...
What I Left out: Validation
There was no space in this essay to discuss a very important
topic: validating the spec changes. It may be that a programmer
has done something nasty like change a column type from character
to integer. Most databases will fail attempting to alter the column
because they don't know how to convert the data. Your builder program
can trap these events by validating the upgrade before any
changes are made. This will be treated fully in a future essay.
More that I Left Out: Indexes, Keys...
There are many many other things you can and really must create
during the build, beginning with primary key and foreign keys,
not to mention indexes as well. These will be covered in a future
essay.
More That I Left Out: When You Still Need Scripts
There are plenty of reasons why you may still need a few
upgrade scripts, these will be discussed in a future essay.
They all come down to moving data around when table structures
change significantly.
Conclusion: One Upgrade To Rule Them All
The approach described this week for upgrading databases has
many advantages. It is first and foremost the most efficient way
to upgrade customers from any version directly to the latest
version. It is also the simplest way to handle both installations
and upgrades: they are both the same process. Putting the dictionary
file into plaintext gives you complete source control just like any
other application file, and overall you have a tight, efficient
and error-free upgrade process.
Update Contact :
No Wa/Telepon (puat) : 085267792168
No Wa/Telepon (fajar) : 085369237896
Email : Fajarudinsidik@gmail.com
No Wa/Telepon (puat) : 085267792168
No Wa/Telepon (fajar) : 085369237896
Email: Fajarudinsidik@gmail.com
atau Kirimkan Private messanger melalui email dengan klik tombol order dibawah ini :