The Reference Data Pattern: Extensible And Flexible

Having reference tables in your database is no big deal, is it? You just need to tie a code or ID with a descri ption for each reference type, right? But what if you literally have dozens and dozens of reference tables. Is there an alternative to the one table per type approach? Read on to discover a generic and extensible database design for handling all your reference data.

Date : 20/6/2016

Author Information

Uploaded by : Mamta
Uploaded on : 20/6/2016
Subject : Database

The Traditional Solution

So how are we going to implement the logical model in the physical world of an actual database?

We could take the view that each reference type will map to its own table. You might refer to this as the more traditional one table per class solution. It's simple enough, and would look something like this:

The down-side of this is that there could be dozens and dozens of these tables, all having the same columns, all doing the very same kind of thing.

Furthermore, we may be creating a lot more development work. If a UI for each type is required for administrators & super users, or subject matter experts - to maintain the values, then the amount of work quickly multiplies. There are no hard and fast rules for this & it really depends on your development environment & so you'll need to talk to your developers to understand what impact this has.

But given that all our reference types have the same attributes, or columns, is there a more generic way of implementing our logical data model? Yes there is! And it only requires two tables.

The Two-Table Solution

The first discussion I ever had around this subject was back in the mid-90's when I was working a London market insurance company. Back then we went straight to physical design and mostly used natural/business keys, not IDs. Where reference data existed, we decided to keep one table per type composed of a unique code (the VARCHAR PK) and a descri ption. In point of fact, there were far few reference tables back then. More often than not, a restricted set of business codes would be used in a column, possibly with a database check constraint defined, and without a reference table at all.

But the game has moved on since then. This is what a two-table solution might look like:

As you can see this physical data model is very simple. But it's quite different from the logical model. It's not because something's gone all pear-shaped. It's because a number of things were done as part of physical design.

The reference_type table represents each individual reference class from the LDM. So if you have 20 reference types in your LDM, you'll have 20 rows of meta-data in here. The reference_value table contains the permissible values for all the reference types.

At the time of this project, there were some quite lively discussions between developers. Some favoured the two-table solution and others preferred the perhaps more traditional one table per type solution. There are pros and cons for each solution. As you may guess for the developers these were mostly centred on the amount of work the UI would take. Some thought that "knocking-up' an admin UI for each table would be pretty quick. Others thought that a single generic admin UI may be a bit more complex but it would pay off in the longer term.

On this particular project, the two-table solution was favoured, so let's look at this in more detail.

Extensible and Flexible

As your data model evolves over time and new reference types are required, you don't need to keep making changes to your database for each new reference type. You just need to define new configuration data. You add a new row to the reference_type table and add its controlled list of permissible values to the reference_value table.

An important concept contained in this solution is that of effective periods. For example, your organization may need to capture a new reference_value of "Proof of ID' that will be acceptable at some future date. It is a simple matter of adding that new reference_value with the effective_period_from date correctly set. This can be done in advance. Until that date arrives, the new entry will not appear in the drop-down list of values that users of your application see. This is because your application only displays values that are current, or enabled.

On the other hand, you may need to stop users from using a particular reference_value. In that case, just update it with the effective_period_to date correctly set. When that day passes, the value no longer appears in the drop-down list of values. It effectively becomes disabled from that point in time. But because it still physically exists as a row in the table, referential integrity is maintained for those tables where it has already been referenced.

Now that we were working on the two-table solution, it became apparent that some additional columns would be useful on the reference_type table. These mostly centred on UI concerns.

For example, pretty_name on the reference_type table was added for use in the UI. It is helpful for large taxonomies to use a window with a search function. Then pretty_name could be used for the title of the window. On the other hand, if a drop-down list of values is sufficient, then pretty_name could be used for the LOV prompt. In a similar way, descri ption could be used in the UI to populate roll-over help.

Taking a look at the type of config or meta-data that goes into these table will help clarify things a bit.

Config Data

Whilst the example used here is very simple, the reference values for a large project can quickly become quite complex. So it may be advisable to maintain all of this in a spreadsheet. If so, you can use the spreadsheet itself to generate the SQL using string con catenation. This in turn is pasted into scri pts which are executed against the target databases that support the development life-cycle, as well as the live, or production database. This seeds the database with all the necessary reference data.

Here is the config data for the two LDM types, Gender_Type and Party_Type:

PROMPT Gender_Type

INSERT INTO reference_type (id, pretty_name, ref_type_key, descri ption, id_range_from, id_range_to)

& & & VALUES (rety_seq.nextval, `Gender Type`, `GENDER_TYPE`, ` Identifies the gender of a person.`, 13000000, 13999999)lt;/p>INSERT INTO reference_value (id, pretty_name, descri ption, effective_period_from, alt_sequence, reference_type_id)

& & & VALUES (13000010,`Female`, `Female`, TRUNC(SYSDATE), 10, rety_seq.currval)lt;/p>INSERT INTO reference_value (id, pretty_name, descri ption, effective_period_from, alt_sequence, reference_type_id)

& & & VALUES (13000020,`Male`, `Male`, TRUNC(SYSDATE), 20, rety_seq.currval)lt;/p> lt;/p>PROMPT Party_Type

INSERT INTO reference_type (id, pretty_name, ref_type_key, descri ption, id_range_from, id_range_to)

& & & VALUES (rety_seq.nextval, `Party Type`, `PARTY_TYPE`, A controlled list of reference values that identifies the type of party.`, 23000000, 23999999)lt;/p>INSERT INTO reference_value (id, pretty_name, descri ption, effective_period_from, alt_sequence, reference_type_id)

& & & VALUES (23000010,`Organisation`, `Organisation`, TRUNC(SYSDATE), 10, rety_seq.currval)lt;/p>INSERT INTO reference_value (id, pretty_name, descri ption, effective_period_from, alt_sequence, reference_type_id)

& & & VALUES (23000020,`Person`, `Person`, TRUNC(SYSDATE), 20, rety_seq.currval)lt;/p> lt;/p>There is a row in reference_type for each LDM subtype of Root_Reference_Type. The descri ption in reference_type is taken from the LDM class descri ption & e.g. of Gender_Type : 'Identifies the gender of a person'. The DML snippets show the differences in descri ptions between type and value.

You will see from this that the reference_type called Gender Type has been allocated a range of 13000000 to 13999999 for its associated reference_value.ids. In this model, each reference_type is allocated a unique, non-overlapping range of IDs. This is not strictly necessary but allows us to group IDs of related values together. It kind of mimics what you'd get if you had separate tables. It's a nice to have. But if you're don't think there's any benefit in this then it can be dispensed with.

Another column that was added to the PDM is admin_role. Here's why.

Who Are the Administrators?

Some taxonomies can have values added or removed with little or no impact. This will be the case because no programs make use of the values in their logic, or the type is not interfaced to other systems. In such cases, it is safe for user administrators to keep these up to date.

But in other cases much more care needs to be exercised. A new reference value may cause unintended consequences to program logic or to downstream systems.

This resource was uploaded by: Mamta