Saturday, August 16, 2008

Building the Base - Data Integrity

Something I feel is far too often overlooked in software development is data integrity.
I would like to further define it as strong data typing.

The source of lack of strong data typing is several places.
1) Lack of proper column types in the database.
2) Reuse of typed columns to mean different things.
3) Programs/Stored Procedures that decrease the strength of data typing.
4) Flat file (fixed record formats)

Now I'm not a DBA or anything like that.
But one thing I've noticed when developing JSF applications is lack of data integrity seems to bubble it's way up into the presentation layer.

It should be dealt with at the source to provide a better base for an presentation/gui layer.

1) Lack of proper column types in the database.
If you are using clobs, blobs, text, varchar, or whatever flavor of large catch-me-all for typed data please stop. One symptom is that all your columns seem to have the same type. Please use date, int, time, timestamp, bit, numeric, float, decimal, bigint, tinyint.

2) Reuse of typed columns to mean different things.
Use of low values or high values to mean something:
99.99, -1, or any date like January 1st, 1900.
I think almost any database or language has null built in for this reason.
I'll see logic if 0.00 or 99.99 then display N/A so why don't we just define another column with a one character value and a domain table.
So instead of this:
Effective Date: Jan 1st, 1900
We do this:
Effective Date: null Effective Status: 'P'
where there would be a Domain Table with things like 'P' - Pending, 'T' - Terminated, 'I' - Invalid and 'U' - Under Review


3) Programs/Stored Procedures that decrease the strength of data typing.
If you read in two strongly typed and return a lesser typed one this would be a smell. For example if you take two Numbers in and generate a String number result and N/A result sometimes.

If you take two specific things and return a generic one then that is a smell.
For example if you take name fields and put them together to display. Well this is all good but what if you then want to pre-populate a name into a form. How do you parse a complex name which may or may not have middle initials and a suffix?

4) Flat file (fixed record formats)
This is the root of all evil because there are no rules as to what goes where as long as it fits.
I propose that you establish a contract with the generator about the types if possible.
Don't leave it in flat file format, extract it into a strongly typed database and throw out invalid data. It is better to throw out or reject rows than to have it in your database.

No comments: