Simple estimations work. Use the data you have available

I very often hear the “estimations” conversation. People are so infatuated with the estimations that they forget to look at the actual data. By data I mean what they have been able to accomplish in the past.

When we were lost in the dark ages of Waterfall and it’s Linear Pseudo-iterative friends (like RUP) we did not really have much data to rely on. Sure, there was the universal lines of code (which is anything but universal) or the complex but supposedly reliable function points (which were everything but reliable) or other metrics.

In those days you were supposed to define the size of a feature or non-functional requirement by making complex calculations that would ultimately deliver the universal size for the work at hand.

Later on, when people, finally figured out that lines of code or function points did not work, they turned to man-hours. Or as Fred Brooks put it “The Mythical Man-month”. It took a while but by the turn of the Century there were many voices also saying that this was really not the “silver bullet” that people expected (those that did not read the book obviously expected the silver bullet, the others did not).

Today we no longer are in the realm of illusion and thanks to a very simple construct we have a very simple metric to measure our past performance and to project our future progress. That unit is called the “Product Backlog Item”.

Cockburn in his “Agile Software Development” book already talked about “burn-down” charts, so did Ken Schwaber and Mike Beedle in their “black book”. The Burndown chart together with the Product Backlog from Scrum are an optimal tool to measure past performance and future progress.

It is really very simple, but let me establish the basis for the argument first. Product Backlog Items (aka Items) are just requirements, mostly features but also non-functional requirements (such as usability or performance or security). These items are roughly estimated by the development team (only the top items, say 2-3 iterations worth of them are estimated). For this task of estimation you may choose hours, story points (à Mike Cohn) or some other metric that works. My metric is: if you think it will take more than 2 weeks to complete (half of our iterations which last 4 weeks), then break it down into smaller units. The rationale is simple, if you think it takes more than 2 weeks you probably have not thought enough about it and it may have some nasty surprises inside, so think about it and while you are at it break it into smaller units.

That’s how much we estimate. The simple and beautiful part comes next.

In the Sprint Planning meeting we do the estimation described above and then tell the Product Manager (Product Owner in Scrum-parlance, Customer in XP-dialect) how many items we intend to complete (meaning “ready to release”) in the iteration. This number is, in the first 2-3 sprints, completely based on the understanding and rough estimation of the features/requirements at hand.

After 3 iterations the Product Manager has enough information to assess how many PBL items we will be able to complete in the next sprints. Now, why we do this is simple: it is our experience that the number of PBL items we can complete in a 4 week iteration is roughly the same from iteration to iteration.

Let me state this in a more clear way: The number of items a stable team is able to complete does not vary very much from the average number of items they were able to complete in the previous iterations.

Here’s an example: our team had completed in the first four iterations the following number of PBL items:

  • iteration 1: completed 1 item
  • iteration 2: completed 8 items
  • iteration 3: completed 8 items

How many PBL items do you think they completed in the 4th iteration? Exactly, 8 would be my guess to! And it would have been a very good guess. They were in fact able to complete 10, but in future iterations they were again back to 8.

When looking at several projects (small and big) we have noticed the same stable output from the teams as long as they are stable (i.e. not much changes in people during the sprint).

The theory behind
Now, there is a very good reason for this to happen. Over a sufficiently long period of time (3 or more iterations) the size of the PBL items will be equally distributed and therefore the big items will be balanced by small items.The result is that over a sufficiently long period of time the Product Backlog items’ size is not relevant, their number becomes enough to be able to measure future progress.

In the previous example you could have planned the whole project based on the fact that the team would do around 8 (+- some) items per iteration.

Secondly if you consider that the team does not change its composition drastically (like more than half at one point) you can trust that it’s output will be stable. Unlike some managers I’ve met in my life that seem to have this unwavering conviction that they can magically (and without a long term improvement process) improve their team’s output, I believe teams are pretty stable in output. In other words you cannot easily change the upper limit of productivity of a team, a team is a system, and to improve their output you, the manager, have to do proper improvement work (look at root causes for bottlenecks, change processes, tools etc.). It is both unwise and unrealistic to think that you can change the teams output in the long run by using over-time or psychological pressure (like performance assessments).

Since you can trust that the team’s output is stable (with small variations), you can keep the team going at a regular/sustained pace and know pretty much what they will accomplish during the project. Therefore have an accurate estimate of what their output will be for the duration of the project.

So, here it is, the number of PBL items that a stable team completes in one iteration is really the best estimation you can have to assess their output during one project of 3 or more iterations.

4 thoughts on “Simple estimations work. Use the data you have available

  1. The idea is simple, over enough sprints all items can be considered to have a similar “weight” when calculating schedule. The reason is that if there’s enough sprints in the project you will have enough small and enough big items for them to compensate each other and align nicely around a median value.

    We have used “number of PBL items done” as the velocity number and we have verified that it is _very_ constant (only very small oscilations).

    This is also in line with Statistical Process Control theory that establishes that a closed system (like a team that does not change all the time) is very likely to produce a similar output all the time (per period of time) unless there are special causes for the output to vary. These special causes can be removed (we call them blockers or impediments in Scrum) and that will lead to a similar number of items compeleted per unit of time.

  2. So, how do you work when the product owner needs to replace some item with a new item in the backlog? You haven’t estimated the size of the iteration items and thus the product owner cannot tell what item is the same size as the new one and can be replaced with it.

    According to your simple estimation scheme, the product owner could replace whatever item. Does that really work?

  3. Yes. But let’s look at why.

    If you have a project that is long enough (more than 3 sprints in my experience), the item size varies around a normal distribution. What that means is that whatever items are added will (in my experience, again) have sizes that folow a similar distribution in their size.

    In practice what this means is that, statistically, you can take the median size as the measure for “all” items over a sufficiently long period of time, which leads to the conclusion that “in a sufficiently long project all items can be considered the same size (stipulated at 1 in my example), independently of their real size”.

    Note that your mileage may vary and this is pretty much only empirical evidence, which in my experience holds. I’ve looked at several projects in our company.

Comments are closed.