The Unknown

Donald Rumsfeld famously said:

“... as we know, there are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns — the ones we don't know we don't know.”

At first is seems whimsical and likely to incur great mirth, but a closer look reveals that it has a lot it has a lot relevance to long lived code.
On any but the most simplistic project the features and requirements will continue to evolve, often way beyond the original scope. If we only code for what we know today, then every evolution of the features is likely to be more difficult and time consuming.

Coding for this evolution before it occurs is a real challenge for the programmer.

To meet this challenge you should use the following techniques: Generalisation, Data Driven Programming.

Generalisation

Instead of being very specific on everything all of the time, try to generalise everything as much as possible. For, the less logic you build into the actual code, the less changes will be needed to accommodate future changes.

The less specific handling you do in your code, the less complicated the code is and the less the chance that you make a mistake.

When handling data you should always ask yourself how much you really need to know about the bit of data. If you are merely asked to store and retrieve some data, then do just that. Treat yourself like the postman — you deliver an envelope, but you need not concern yourself with its contents.

Your client will always tell you that this field or that value is “a special case”. It is easy to fall into the trap of believing them and not making the distinction between what is special to them and what needs to be special to the program that handles that bit of data.

They will also tell you that each scenario is completely different and needs to be handled differently. This is true when viewed from their perspective, but need not be true for the implementation. From the programmer's perspective two items might be identical in how they are processed, even if they have vastly differing meanings to the client.

A real life example:

I worked on a project where jobs, consisting of groups of files, needed to be pushed out to a network of multiple servers, and we had to activate, de-activate and check the status of the jobs by directly interfacing with the server.

In order to keep track of each job and the servers and options that each job employed we made use of a data base. There was a table with a row for each job. The columns obviously contained information that we needed to keep track of the jobs.

However, we were not the only users of these jobs. After the system had been in place for a while a new user wanted to keep track of a job specific attribute — which had nothing to do with the process we were implementing. It was just a little bit of information that needed to be shunted around.

The DB team wanted to add a column to the job table to represent this specific attribute. They wanted me to parse the value from on of the input files and propagate it to that column.

As I thought other attributes might come along later that would require a similar treatment, I did not want to update the program and the data base schema for each and every one of these. Instead I proposed that the the input file should contain a set of name/value pair definitions along these lines:

Attributes
{
    Name = value;
    Name2 = value;
}

I told the DB team that they should create a new name/value attribute table with columns similar to this:

jobId int,
name nvarchar(50),
value nvarchar(50)

I would call a stored procedure for each name/value pair thus:

AddJobAttribute(@jobId int, @attrName nvarchar(50), @value nvarchar(50));

The consumer of the attribute could call a stored procedure retrieve the value:

GetJobAttribute(@jobId int, @attrName nvarchar(50));

We went through the implementation of this: The users put the attributes into the input file, I parsed the table, pushed out the values (there was just the one at this time) using SetJobAttribute() . When the consumer of the attribute finally called GetJobAttribute() and got the expected value we had established that the implementation was working as designed.

Job done and dusted, you would have thought.

About 1½ years later they wanted to push another attribute out to be used by someone else in another context. I told them just to add it to the Attributes table and get the consumer of the attribute to call the GetJobAttribute() stored procedure to retrieve the value.

Alas! It did not work.

After some digging I discovered that I was definitely pushing the new attribute out, but it seemed to disappear into thin ether. My generalised solution had not been implemented as requested. Instead of pushing values out to the table as requested, the DB team had taken the problem specific approach.

They had not added the new table, but instead had implemented their original proposal of adding a column to the job table.

The AddJobAttribute() stored procedure I was calling checked for the specific attribute name of the original request, and, if matched, updated the new column in the job table.

The GetJobAttribute() stored procedure checked for the specific attribute name and retrieved the value from the new column of job table.

They had in essence implemented the business logic “What is the meaning of the particular attribute - and where do I need to store this?” and “What is the meaning of this attribute and where do I need to retrieve it?” into the code of the stored procedures.

They did not handle the data generically.

It only worked for the one specific attribute name. The moment we added any new attribute, the system required maintenance — we had to go through a whole release cycle, with testing etc to fix something that should not have been broken.

This time I insisted they do the job properly, and it paid off. When other attributes came along later on, our program required no further changes.

Data Driven Programming

Instead of hard coding your various sets of values that you need to work with, try to put the values into external files, if possible and use a generic way of processing them. Then, when new sets of values pop up, you will not have to update the code, but only the table.

You might ask the question: “What is the difference between updating a table and you code, especially, if the table is a de-facto extension of your code?”

Fundamentally there is little difference — it is a change in the way the program will work. But by putting the data into a tables or files you can put these tables or files in one central spot and make maintenance a whole lot easier.

Also, quite often you will find that the same sets of values are used in multiple places in the program (for example: storing, retrieving and transmitting). If you have all of these places use the same sets of value, all parts of the program will be updated simultaneously by adding, deleting or modifying these values. Otherwise you might update one instance where these values are 'hard coded', but not another one, leading to multiple build and test cycles.