Tuesday, June 29, 2010

TDD for Operations

Software developers have enjoyed the benefits of Test Driven Development for a long time now. System Operations professionals have not yet been test infected.

Test Driven Development (TDD) allows developers to refactor and add new features with the security the impact of their changes are restricted to the intended components. System Operations professionals don't always have such a tool and rely in human knowledge to make sure all integrated systems will behave the same after a change like an OS upgrade.

In Software Development teams often create more code to test the executable code. What could be used for the System Operations case? Monitors!

Monitors are a nice analogy to the green/red way of writing code. Instead of writing a test that doesn't pass, creating the code and then seeing the test pass; operation professionals create a set of monitors which alerts until a certain component is installed.

For example, before installing a new web application, a monitor is created for watching if the web server is up. This monitor would alert until a web server is actually put in place and listens to the proper domain and port desired. Once completed, another monitor would be created for say the number of database connections in the pool and so on.

This approach allows for more frequent changes to infrastructure. If there's a solid deployment process with easy roll back of failed changes, software modifications can be pushed to production at any time at a low risk (Continuous Deployment).

Testing the application constantly in pre-production environments will ensure there are few to no bugs in the software; however, it doesn't ensure configuration issues are not present once it is moved to other environments. An option to mitigate this risk is to run a complete regression test suite against all environments.

There are tools which can effectively use functional tests as transactional monitors such as HP SiteScope. Transactional monitors based on functional tests are great, but it won't provide the more granular results an individual monitor does. As with regular functional tests, these monitors are great for detecting an issue, however, they don't help pinpointing the root cause quickly. If using functional monitors, make sure to include execution times. This ensures the monitors go off if the system degrades beyond agreed service levels.

The automation effort has slowly moved from development to QA. It is time for it to infect the operations teams as well. These teams will greatly benefit from deployment automation and integrated monitoring.


Daniel Kushner said...

Automation can definitely benefit the operation folks. We see the strong movement towards DevOps (twitter/devopszone) and platforms that control and manage automated deployments: http://www.noliosoft.com

Unknown said...

I distinguish between monitors (continuously running) and regression tests (run after a known change in operations). Typically, regression tests are run only after software deployments. However, properly packaged, they can/should be run after every set of changes in production. Additionally, proper design of regression tests, would allow more of them to be converted to monitors, detecting inadvertent changes.

Dan Nemec said...

Great post. Your idea is something we were kicking around and gave it the catchy acronym MDD for Monitoring Driven Deployment. I like what you say about deploying the monitor before the change and let the monitor go green after the deployment. The ideal monitoring infrastructure should be so robust that when all monitors go green you have high confidence of a successful deployment. Now, who do you think should write the monitors, the developer who wrote the code or operations? If operations, then they need to be involved in the development cycle (which would have huge benefits and be a great practice of Devops). Or, should QA write the monitoring and consider the monitoring tool one of their QA automation tools?

Post a Comment