The Errors of Estimation

What we often hear from estimators is that estimates are supposed to be “intervals”, not single-point values.

This blog post is now a podcast also, narrated by Billy Elchin

As usual, they don’t define what “interval” should be. They just say that you have to be “accurate”, but not so “precise”. 

Ok, let’s explore what that might mean. 

Let’s say you get a request to estimate a project. You do your homework, and you come up with a number. Say: 20 “man-days”. What does that mean? Does it mean that you can deliver it in 1 day if you have 20 people working on it? 

Surely that’s not the case, or even expected. Just think of testing, UX work, development, etc. 

So, the estimate will be more something like: 

  • 3 man-days UX design
  • 5 man-days of PM work
  • 7 man-days of Dev work 
  • 4 man-days of testing
  • 1 man-day of release and shipping related tasks

For a total of, you guessed it, 20 man-days. 

Interval magic

OK. Now, let’s go back to that interval we were talking about. Let’s say that your margin (estimators also call it “contingency”) is 20%. In that case the estimate could be something like: 

  • 3 to 3.6 man-days UX design
  • 5 to 6 man-days of PM work
  • 7 to 8.4 man-days of Dev work 
  • 4 to 4.8 man-days of testing
  • 1 to 1.2 man-day of release and shipping related tasks

Now we have the interval we were talking about above, and it comes out at: [20 to 24] man-days. 

Now, some people might take issue with that, and say that the “margin” is in both directions, not just on the “plus” or right-side of the original estimate.

This is where things get complicated, and people will talk about optimistic, pessimistic and “most-likely” estimates (3 point estimation). Without making this too much of an estimate lesson, what that would mean in practice is that everyone in the team must give 3 estimates for every task, and from that calculate, mathematically, what the estimate should be. 

The incoherence in estimation practices

However, there’s a problem here.

We started this post by talking about the idea that estimates should be intervals. What happens is that the 3-point estimates drive teams to present a “single-point” estimate for each task that is based on those 3 estimates. I’d say that, logically, asking for a 3 point estimate and then calculating a single point defeats the whole point of having a 3 point estimate!

That’s what we would call an internal incoherence (there’s plenty of those in estimation).

For now, let’s get back to the interval to incorporate the idea that estimates could be wrong in both directions (early and late), but most likely to be late (this we can easily observe in the wild). So, our intervals might be something like: 

  • 2.8 to 3.36 man-days UX design
  • 4.7 to 5.64 man-days of PM work
  • 6.5 to 7.8 man-days of Dev work 
  • 4.7 to 5.64 man-days of testing
  • 0.9 to 1.08 man-day of release and shipping related tasks

And now our interval is: [19.6 to 23.52] man-days. 

So, now we have 3 possible estimates to give out: 

  • a) 20 man-days single point estimates.
  • b) [20 to 24] man-days. 
  • c) [19.6 to 23.52] man-days. 

Which one should we use? Wait! We’re not done yet!

Not so fast, what about the fat tails? 

There’s another problem in the use of intervals in estimation.

Although the estimators argue that 20% margins are “normal” and “acceptable” (I take issue with that, but that’s another post), the fact is that estimates are more often wrong on the right side (late), than on the left side (early).

So, we should factor that in. But a question arises: how much can an estimate be wrong on the right side (late)?

There’s quite a lot of data on this, but I’ll use a conservative estimate. Let’s say that the estimates can be 250% (2.5 times) wrong on the right side (late). 

Is this conservative? Yes. Let’s look at evidence A, from Steve McConnel’s (an estimator and author on how to improve estimates) book “Demystifying the black art of estimation”. 

In that book, McConnel shares a graph of the “on-time” record for one of the companies he consulted (adapted here from the NoEstimates book): 

Figure 1 – Graph adapted from Steve McConnell’s book: Software Estimation, Demystifying the Black Art.

In Fig. 1, we see those two projects on the top left, which should have lasted around 2 to 7 days, but lasted 200+ days. That’s an error of 2800+%. 

In my model, based on data I collected from projects where I worked, the largest error I saw was 250%, so I’ll use that *conservative* margin and apply it to our interval, which gets us to: 

  • 2.8 to 7 man-days UX design
  • 4.7 to 11.75 man-days of PM work
  • 6.5 to 16.5 man-days of Dev work 
  • 4.7 to 11.75 man-days of testing
  • 0.9 to 2.25 man-day of release and shipping related tasks

And now our interval is now: [19.6 to 49.25] man-days. 

The takeaway

In this example, we’ve established: 

a) Intervals are hard to calculate (we’ve explored 3 different ways, that deliver 3 different values), and leave a lot to be interpreted, which defeats the purpose of estimation since every person can choose (and be in line with literature!) whatever method they want.

b) The traditional “margin” of 20% is not applicable in real-life cases, we investigated how even estimation proponents often show data where projects are orders of magnitude late compared to their original estimate (and use single-point to make their point, an internal incoherence).

c) Estimating according to “best practice”, is often a time consuming, and still error-prone practice (e.g. three-point estimate).

d) When we incorporate errors seen in real life, we are often talking about margins that are 10x larger (250%) than what estimators say is acceptable “margin” (usually around 20%) 

Estimation is an internally incoherent practice, that often yields information that is inadequate for decision making (would you book 19.6 man-days or 49.25 man-days for this project?).

And on top of that, it increases the effort the teams need to spend in order to deliver software (estimation effort has a cost that – depending on your techniques – might not be irrelevant). 

Furthermore, estimation requires us to suspend our belief that Agile approaches are better for software development. For example, you need all the requirements up-front to even do the kind of simple estimation we used in this example. 

In short, estimation is a failure-mode in SW development.

Use #NoEstimates instead, and learn what Carmen did to save her project by using #NoEstimates approaches in a project that seemed doomed: The NoEstimates Book.

One thought on “The Errors of Estimation

Comments are closed.