Thoughts on 'Potentially Shippable'

Scrum calls for the delivery of a 'potentially shippable' product increment at the conclusion of each iteration.  The reason it is 'potentially shippable' (rather than simply 'shippable') is that ideally it should only be a pure business decision as to whether enough value has been accrued to warrant actually shipping.  Therefore, the functionality that is exposed to the user works as intended based on the implemented Stories/Acceptance Criteria which in turn presumes that the quality is fit for purpose.

The value of keeping software in a 'potentially shippable' state at regular intervals is twofold: a) a real/tangible indication of progress for use in making date vs scope decisions and b) the ability to garner meaningful feedback at regular intervals.

If the software is in a 'potentially shippable' state, progress towards the end-goal is based on working, tested workflows/functionality implemented in software and not based solely on overall task estimates.  If the software is in a 'potentially shippable' state, feedback from existing customers, potential customers, and internal stakeholders can be meaningful.  Otherwise feedback can be, at worst, invalid, and at best confusing.

One of the goals of iterative/incremental development is to minimize the difference between 'potentially shippable' and 'shippable'.  Ideally it is simply a business decision whether there is enough value to actually warrant shipping.  In practicality, however, for many teams there are activities that they need to perform prior to actually releasing that they are unable to perform every iteration.  In some cases, the totality of all manual, automated and performance acceptance tests possible and/or necessary to execute and analyze in order to fully assess whether a given build is 'shippable' takes on the order of weeks to months.  In other cases, there is just too much legacy code which is not covered by automated testing to allow for the creation of something considered 'potentially shippable' within any given iteration.

With this in mind, teams need to be able to focus on those activities that they can accomplish inside an iteration which will best lead them to having confidence that the iteration backlog Stories work and that previously implemented workflows still function correctly.  Doing so will lead to a smaller gap between 'potentially shippable' and 'shippable'.  If a significant part of the cost of change in a code base is the uncertainty created by the change and our inability to validate (in a timely manner) that our workflows have not been inadvertently effected, then we should always be striving to minimize the time it takes to do that validation.  Validating quicker leads to finding and fixing problems quicker and cheaper.  Automate, automate, automate.

IDEALLY:

- acceptance criteria outline the circumstances under which each new workflow functions and the associated expected results
- if the acceptance criteria are met, and we have proved that the acceptance criteria for previously accepted workflows continue to be met, then we are potentially shippable.

PRACTICALLY:

- acceptance criteria for any given Story need to include regression tests for previously working functionality (either manual or some subset of long-running automated tests) which the team has assessed are likely to have been effected by the code changes necessary to complete the Story in question.

- alternatively, the Definition of Done can be altered to include a statement about the inclusion of relevant, focused regression tests which are either performed manually or are a subset of an existing long-running test automation suite.

- those manual regression tests then need to become an ongoing part of the automated test suite

The usual objection to this approach is that it means that teams apparently deliver less in an iteration.  This of course is a red herring as the teams were never actually delivering as much as they thought in an iteration because the regression testing necessary to deliver functionality was hidden in the 'stabilization/hardening' period prior to release.  Moving that regression testing forward moves teams closer to the ideal and should lead to shorter stabilization/hardening periods.

I'm often asked "How do we measure if we are 'potentially shippable'?"  My response to this is generally the same, "You're 'potentially shippable' if the software behaves the way you say it does."  Without some way to adequately describe (and ultimately test) this behaviour, it is difficult to know if you are 'potentially shippable'.  This behaviour is, of course, described in stories and their respective acceptence criteria.