Thursday, April 28, 2011

Panic

Ahh, panic.  You can recognize this phenomenon in an office environment by the sound of raised voices, terse emails marked "Important" with words such as "immediately" or better yet "yesterday".

When does panic happen?  In my experience (and we'll focus this on in-house software development), two of the most stressful scenarios are software releases and system-down failures. 


Software Release
It never fails.  After weeks (sometimes months) of careful testing and analysis, a myriad of problems always seem to materialize at the end of a development cycle.  This can take many shapes:
  • Undiscovered Requirements I love this one.  After coming up with a really solid architecture, design and implementation, a stakeholder realizes that they were completely wrong about a fundamental assumption.  This is particularly awesome if you've verified this assumption numerous times and built an entire solution on top of it.
  • Procrastination I find it amusing when somebody has known they had to do something for months then all of a sudden start making noise about it a week before a product goes live.  What started out as a "minor feature, it'll take me an afternoon" turns into a fiasco at the last minute due to lack of planning.  This is when written requirements go out the window and the band aid solutions are applied.
  • Late Testing You can get user feedback early in the process, but treat it all with a grain of salt.  When you get a week out from release, everything changes.  This is when the users who have been sitting on their hands for months suddenly panic and realize they haven't tried the new piece of software they need to use in order to do their job.  All of a sudden any small deviation from what they expect becomes an uber-critical bug.  These users also wait until this window to ask for "can't live without it features" even though they agreed to (or ignored) the original requirements.
By the way - to all the early testers of the software I have ever worked on, I give you a heart felt "thank you."  You made my life easier (and probably yours too).

Mitigation
So how do you reduce / eliminate panic in the development / release process?  It goes back to the basics of the software development life cycle:
  • Requirements Nothing new here.  Every feature that is added to an application needs to have a written spec.  This spec needs to include what it is, why it is,and how it behaves (and ultimately how to verify it).  Everything needs to be written down, including who asked for the feature.
  • Design Yes, software still needs to be designed.  We have some great tools to help us with this process, but it still needs to be done.  
  • Implement This needs to be done correctly.  There are many right ways to implement software, and many wrong ways.  While there are many excellent design patterns that can be used, the universal advice of DRY (Don't Repeat Yourself) should remain at the top of the list.  One of the best ways to produce unmaintainable code is to copy and paste the same thing over and over. 
  • Involve as Many users in testing as possible The users that jump on board early enough in the testing cycle can actually have some say in direction of development.  Users that don't touch the product until the day before it is released aren't going to get features added anytime soon.
The System is Down!
Watching people react to system down situations is particularly interesting to me.  The panic in this case can come from both users and IT.

I've seen three main approaches from the IT side:

  • Deer in Headlights This is fun to watch.  Grown adults freeze in their tracks looking around the room for someone to come up with an answer.  This person has completely succumb to panic.
  • Jump in and start hacking  This one can go either way.  Normally introverted technical staff suddenly turn into ninjas and start throwing fixes at the problem (e.g. reboot now!  kill the service!)
  • Detective Mode This is when somebody starts looking at logs, performance counters, asks questions, etc. 
The deer in headlights response doesn't solve anything, the problem just lasts longer (and might even get worse).

Jumping in and hacking away can go either way.  Even if prompt action is required, this is a dangerous time because normal procedures are often skipped and many safeguards are removed "just to make it work"  Sometimes this can backfire with either immediate consequences (make the problem worse) or long term consequences (e.g. "you know how you gave that folder write permissions for Everybody?  Well, somebody wrote to it").

Detective mode is good, as long as it leads to action.  Paralysis through analysis is just as bad as deer in headlights because nothing is getting done. 

I would like to think I'm a healthy mix of the last two... I'm not afraid to jump in and try something if I've got a good gut feeling where the problem is.  However, if I'm flying blind, I go into detective mode and start eliminating potential sources of error component by component.

Mitigation
Every system and process should be designed with backups and procedures for when it fails.  Even super-big-crazy-expensive clustered solutions can fail brilliantly.  It takes a mindset of assuming that failure will happen (not might happen) to be prepared for this.

Practice, practice, practice!  A disaster recovery procedure is useless unless it has been regularly tested.  The canonical example of this is backing up data without testing a restore.  How do you know you can recover unless you've done it?


Conclusion
At the end of the day, there are many ways to avoid unnecessary problems and prepare for unavoidable issues.  But the best advice?

Don't panic.

Friday, April 8, 2011

Replacing low values in DB2

Here's a rare one - it is possible to have data with low values (aka null characters) in the middle of a string value in DB2.  This also happens to blow up WCF when using BasicHttpBinding (the stack can't handle parsing a string with null characters in it).

In our case, the low values (null characters) came from the IDMS datasource we are converting from.  Your low values could come from anywhere.

The first thing to do is to find out if you have records with this issue.  For this, you can do something like:

SELECT 
    PRIMARY_KEY, 
    TEXT_FIELD 
FROM 
    OWNER.TABLE
WHERE 
    REPLACE(TEXT_FIELD, X'00','') <> TEXT_FIELD

And say you actually want to fix it, you can do something along these lines:

UPDATE
    OWNER.TABLE
    SET TEXT_FIELD = REPLACE(TEXT_FIELD, X'00','')
WHERE
    REPLACE(TEXT_FIELD, X'00','') <> TEXT_FIELD

Another way to accomplish the same thing:

UPDATE
    OWNER.TABLE
    SET TEXT_FIELD = TRANSLATE(TEXT_FIELD,(X'40'),(X'00'))
WHERE
    POSSTR(TEXT_FIELD,(X'00')) > 0  

Happy low value removing!