Story points tracking reveals more than hours

My team recently got a couple of very interesting burn down charts from our previous sprint, and we’ve had a very good discussion on how it came and we’re feeling this case could be a very convincible evidence to say using story points to estimate is over using actual hours.
Before we look at those charts (real charts grabbed directly from our task tracking systems), let me introduce a little bit what our current estimation/tracking process is.
  • We estimate the story size using planning poker; we give each story a number of story points by comparing the size with our base story.
  • Then we break those user stories down into detailed tasks and estimate how many hours we need to complete each task because our client requires us to estimate and track our velocity and capacity in hours.

We use the client Jira site (Atlassian Jira + GreenHopper) to track the task completion using an hour based burn down chart. We have the policy that each time when the individual logs the real effort into Jira he/she needs to re-estimate how many hours are remaining before we call it is completed.
However, at the same time we’re still using our own whiteboard tracking system on which we draw a story point burn down chart by hand to publish the day-to-day user story completion status.
You may have already realized that what is happening in our project – we use two different ways to estimate and track:
  • Hour based estimation for the detailed tasks, and an hour based burn down chart tracking the task completion.
  • Story point based estimation for the user stories, and a white board story point burn down chart tracking the story completion.

This provided a great opportunity to compare which way works better for us using one single set of project data. Now let’s take a look at the real interesting things, the two different burn down charts for our last sprint.

Hour based burn down chart:

Story point based burn down chart:

If we just look at the hours burn down, you’ll feel everything was just going perfectly, the burn down trend was as good as any Scrum examples, we only had 9 hours among the planned 240 left at the last day. But if you look at the story point burn down, things are totally different. By estimation the planned stories is weighted 50 points in total, and at the last day we just delivered 16 points – that definitely was a significant failure.

My team analyzed how this situation took place. When using the hours burn down chart generated by Jira, the system doesn’t care about whether or not the user stories are really completed, it just put all those hours we logged into the tasks together, and use that aggregated number as the “completed” work. Jira is calculating using the below formula:

How much effort has been put = how much work has been done

But that formula is NOT telling us the truth, there’s no association between effort and the delivery:
  • The tasks which one team/individual can finish in 10 hours may cost 100 hours for another team/individual.
  • It’s possible that the team spends 100 hours while delivering nothing.
  • As a user story, it’s either “Done” or “Not Done”, we cannot use a ratio number (as we easily go with when estimating how much effort is remaining to complete) to say it’s “Partially Done” – we cannot burn 33.3% story points down for one user story if it’s not done yet.
The conclusion my team had are:
  • Hours estimation never goes accurate, different people working on one task takes different hours.
  • Hours spent doesn’t equal to task completion, “partially done” is a dangerous status which hides the problems.
  • Story point estimation is simpler and more Agile although it’s not easy.
There is a lot of discussion around how to do story point estimation if you Google, we were also having this question before and some of our team members were still hesitating using story points because it’s not easy to understand and use, but after this sprint we really realized that we should give up the traditional hours estimation because it’s giving us the wrong feeling and are leading us to the wrong direction. In the retrospective meeting my team decided that we’ll do more practice in the future and incrementally make our story points estimation accurate. I’ll be glad to share our experience if in the future anything interesting happens again.