Measuring the flow of real work

In the context of digital product development, the key to successful delivery is to manage and optimize work as a flow. While there are diverse valuable metrics to consider, there are 2 key metrics delivery managers should care about first: end-to-end lead time and release frequency. To leverage these metrics meaningfully, it is essential to understand the types of work that product teams undertake and how they contribute to the overall delivery process. In this article, we present a model for work classification and demonstrate how it can be effectively utilized to manage the flow of work for digital products or services. The underlying premise is that one or more teams are assigned to develop, operate, and deliver these products or services.

To facilitate continuous improvement in product delivery, our analytics tool for product and delivery managers has been built upon this work classification model. By measuring and analyzing the various aspects of work, product managers gain valuable insights to enhance the delivery process.

Throughout this article, we will explore the challenges faced in measuring end-to-end lead time, the importance of classifying different types of work, and the benefits of adopting a structured and scalable work classification model.

Knowing your flow of work

There is no shortage of valuable metrics to optimize workflows, but end-to-end lead time and release frequency are the first two to measure and optimize against. We must know more about the types of work that product teams complete to leverage these and more detailed metrics meaningfully.

Work-in-progress

Work in progress (WIP) is work in the course of being done or carried out. Managing WIP is the pillar of lean manufacturing and a cornerstone of Goldratt’s Theory of constraints.

In accounting, WIP is the value of materials and labor that has gone into unfinished products. Peter Drucker is often quoted as saying, “You can’t manage what you can’t measure.” To manage the flow of work, we need to measure the parts that make up the end product.

For the development of digital products and services, inventory and labor effort are challenging to count and rarely visible. It is not a set of car doors being painted before assembly. But the same principle applies to the ideas and designs; and the millions of bits stored on hard drives are partially done work that carries risks and has tied up resources. The WIP needs to be visible and classified to provide meaningful measures for delivery management.

Challenges

A common challenge when measuring end-to-end lead time is that different types of work take different amounts of time to complete. Fixing errors is typically quicker than developing new features, so the more fixed errors, the shorter the lead time will seem. If all work is measured with the same yardstick, it leads to poor predictability and the wrong expectations. Worst case, it encourages decisions that jeopardize the product’s quality.

Take, for example, a real-world scenario of a logistics solution where Planview’s LeanKit work tracking was used, and all the team’s tasks were managed as “features.” The product managers could see the team’s capacity was fully utilized, and items were flowing quickly, but sensed there was little value being delivered.

Analysis of 427 “features” across four quarterly planning cycles

A quick analysis found that a third of the work planned and executed was knowledge transfer, analysis, or review tasks. As the average duration of these was short, the LeanKit tool showed an average end-to-end lead time that was shorter than it really was.

Classifying demand

The solution is straightforward, especially for one team or a single product. Even a rudimentary work classification, as in the example above, creates visibility and lets you manage and trade off your limited capacity. But aggregating and benchmarking data across multiple teams using different tools and different definitions of work types requires a more structured and scalable model for work classification, starting with the difference between value and failure demands.

Value demand

As the name implies, value demand is work that is expected to provide value to the business, its users, and its customers. Value demand comes in two classes: work on new features expected to directly generate new business value and operational work needed to generate the business value.

Failure demand

Failure demand is work done to fix things without the expectation of generating additional value. We recommend classifying failure demand into two sub-categories to separate the unplanned work of fixing defects from planned re-work.

Value demand and failure demand classes

At this level, it is essential to know how many resources are allocated to each demand class. And how the relationship between the served value and failure demands changes over the product’s lifecycle. We call this Load Mix.

The load mix will naturally differ from solution to solution, but in our experience, for mature custom-built software solutions, the best products have used ~5% of capacity for failure demand. The worst we have seen was a global customs declaration solution where 40% of the capacity was dedicated to fixing reported defects alone.

Managing delivery with work classification

Work classification, load mix, and data on lead time, throughput, and flow efficiency can inform decisions about resourcing and prioritization. Here is an example from an eight-people product team managing a web portal solution for tax declarations.

The team was working on a product in a growth phase, and their load mix was assessed relative to this phase. With three months of data, significant patterns emerged from their load mix view. On average, the team spent less than 60% of their capacity on feature work and just under 20% of their resources on fixing ‘escaped’ defects. *

Load mix from Jira data classified and visualized in VFQ Analytics **

This alerted the team to the opportunity of increasing their value throughput if they could reduce the volume of escaped defects. The team analyzed the root cause of defects, which showed that a lack of test coverage and technical debt were the most common causes. The team decided to prioritize a set of actions, including a focus on tech debt, test automation, and improvement in the deployment processes.

The data shows that the efforts started paying off after about ten weeks. The team now typically spends 25% more of its available capacity on feature development and has been able to nearly halve the time spent on defects. The team continues to watch its load mix and use it to motivate continuous improvement.

Classifying work deliberately provides useful metrics that help your capacity discussion and keep the team and stakeholders focused on continuous improvement.

The work classification model

To provide metrics useful for managing delivery and operations over the product lifecycle, we recommend a tree-structured work classification model that classifies operations demand and re-work. This is to separate the predictable and easily planned work from work that needs to be done with little or no planning.

You can see the tree as a collection of eight backlogs, each with one to nine items. The team picks up a mix of work depending on their planned capacity and capability.

Full work classification model as a tree of backlogs

Value demand

Value demand can be divided into two classes: features, work expected to generate new business value, and operations demand needed to generate the business value and divided into maintenance and improvements.

Features: Feature work represents new or improved functionalities expected to increase value by providing a better customer or user experience (UX). A product manager or product owner usually prioritizes this demand.

Operations demand: Operations demand refers to the work needed to maintain and improve how the solution operates. For example, for producing the business benefits of the product or service.

Maintenance work comes in different forms: requests that come to the teams and therefore cannot be planned, actions that the teams plan, or experiments to learn about product improvements.

Requests are the operational work where demand comes from outside of the product team. For example, user support tickets or activities to serve other teams with their products. Actions represent operational work that comes from the team – in the form of recurring tasks or ad-hoc research and investigations. It can include tasks like preparing for security audits or documenting our customer segments.

With planned and defined experiments, teams can learn about the product and its experience and get output and value in the form of information. A team can work, for example, to get feedback on a new UI prototype from a focus group.

Last but not least important on value demand is the work that improves how the team operates. These are the improvements to the teams’ skills, communications, toolsets, and ways of working. The team or delivery manager typically prioritizes these activities based on decisions from lessons learned, retrospectives, or other continuous improvement events.

Failure demand

On the failure demand side of the tree model, we find defects and rework. Note that if the term’ failure’ invokes strong negative reactions, it can be exchanged for “fixes.”

Defects: Defects only surface once the deliverable goes into production. The product manager typically prioritizes these based on value. These are sometimes named escaped defects or errors to separate them from defects (or “bugs”) that are discovered during the development process. “Bugs” found while developing features are not a separate type of work from the features – in value demand, it’s just work not yet completed.

Rework: Debt stems from intentional decisions to take shortcuts. Most of this work usually represents technical debt, but it can also include debt in user experience and other consciously made limitations of non-functional qualities.

For example, adding a video transcript download option for WCAG compliance.

Fitness work differs from debt because it refers to not anticipated work. It’s when expectations for functional and non-functional product qualities are not met and need to be adjusted. It is typically unplanned work. One example could be adding a ‘save-and-return-later’ option to the session timeout warning pop-up.

Work classification plays a vital role in optimizing delivery management for digital products. By understanding and categorizing different types of work, such as value demand and failure demand, teams can make informed decisions and drive continuous improvement. Our work classification model empowers delivery managers to streamline processes, reduce defects, and deliver greater value to customers. By adopting these principles, managers can enhance capacity discussions, focus on improvement, and achieve efficient product delivery in the ever-evolving digital landscape.

If you want to know more about metrics, or our VFQ Analytics solution for capturing them, please get in touch with our team.

If you enjoyed this article and you are curious about how we apply the principles of Value, Flow, and Quality, we recommend reading about value distribution and how to use feedback loops to create high-quality learning content.

Footnotes:

* The Load Mix report shows the proportion of work in process by demand classes. The ratio is calculated based on ticket-days of the work that is in the process – started and not completed. “Ticket-days” counts the days the ticket was in process during the calculated period, meaning it takes different ‘sizes’ tickets into account. For example, if there were 3 Ops tickets each active for one day and 1 Feature ticket active for 7 days, the Ops Load ratio would be 30%.

** VFQ Analytics is a non-intrusive solution that reads data from Jira and/or MS ADO and maps the existing work types according to the classification model described. It provided product teams and delivery managers with systemic insights into flow and performance metrics. In addition to load mix, it provides metrics on throughput, lead times, flow efficiency, and more.