Understanding Maintenance and Reliability Key Performance Indicators
Using KPIs to Analyze Failures in Your Lubrication Program
Key Performance Indicators (KPIs), when identified and aligned properly especially in the lubrication industry, can save your plant, your job, and your career. If management truly understood the power of KPIs, things would change quickly. In fact, managing without KPIs gives one the feeling of being lost with no hope.
Think of driving a car with the windshield painted black. You can’t see where you are going but you do get a glimpse of where you’ve gone, through the rear view mirror. You don’t know if you were successful or not until either its too late, or disaster strikes.
With the car example in mind, your results are:
- Car goes in the ditch – high cost (or worse)
- You never reach your destination – business goals not met
This is a serious problem, and it costs companies around the world billions of dollars as a result of what, I consider, a lack of management control. Peter Drucker, the great industrial revolutionary stated: “You cannot manage something you cannot control and you cannot control something you cannot measure.”
Defining and Understanding KPIs
Let’s first get down to basics and define KPIs. Within maintenance, we must first define the performance we want to measure.
Is it the performance of the equipment? Is it the performance of the spare parts warehouse? Is it the performance of the maintenance function?
This may seem like a simple question but often I see companies that mix their KPIs as they have not defined the specific area of the business for which they are attempting to measure performance. Let’s assume we want to measure the performance of the maintenance function. There are really two kinds of KPIs to choose from in measuring any particular function of a business.
These two kinds of KPIs are called Leading Indicators and Lagging Indicators (also referred to in this document as Leading KPIs and Lagging KPIs).
“Leading KPIs lead to results”
Example: Scheduled Compliance
“Lagging KPIs are the results”
Example: Maintenance Cost (affected if scheduling is not working)
Leading Indicators are those that we use to manage a part of the business, while Lagging Indicators are those that measure how well we have managed. With Leading Indicators therefore, it is possible to directly and immediately respond when a poor result is found. With Lagging Indicators, we get value from knowing how well we performed, but we have little opportunity to immediately affect underperformance.
Instead, when we see an unacceptable Lagging Indicator, we must typically drill down to the Leading Indicators to uncover the cause of the underperformance, and from there we can implement appropriate changes. Leading Indicators for the maintenance function are those that measure how well we are conducting each of the steps in the maintenance process.
For example, a Leading Indicator for the work planning element of maintenance process could be “the percentage of planned jobs that were executed using the specified amount of labor”. If the planner is estimating labor correctly, we will see a high percentage of jobs completed using the planned amount of labor hours. If the maintenance manager finds that the value of the KPI is lower than expected, he/she can speak with the planner to discuss how best to improve the results immediately – possibly for the remainder of that
day.
With all KPIs, by definition, we are measuring past performance, so I am not suggesting that Leading Indicators can be tweaked to improve upon past performance. But you can see in this case, that if we are managing using Leading Indicators, we can respond immediately when needed.
So, Leading Indicators measure how well we’re performing our jobs while Lagging Indicators measure results. We manage using Leading Indicators, and we react to results using Lagging Indicators.
In the example above, a Lagging Indicator would measure the results of how well we managed the maintenance function. In a situation where the maintenance function is well managed, we would expect an appropriate balance between the cost of maintenance and the plant availability. A Lagging Indicator could therefore be the actual maintenance cost for a month, as a percentage of the budgeted maintenance cost for that month.
If the actual maintenance cost for last month is found to be 110% of budget, there is really very little we can do to directly influence the performance of this KPI today. Instead, we would look at all of the Leading Indicators, probably including those that measure the performance of our maintenance process, to determine whether those values give us a signal for managing the problem.
Unfortunately, in our quest for excellence, we often are attracted to outside consultants that offer “Benchmarking” services, claiming to provide all of the KPIs we need to effectively run our business. Be careful, when considering these services, that you are not signing up for a laundry list of Lagging Indicators, since they won’t help you with managing; they’ll just quantify the problem you already acknowledged when you sought outside help.
Figure 1 shows how Leading Indicators for the maintenance process can provide management capability, while the Lagging Indicators show us how well we have managed the maintenance function. Leading Indicators such as “% of rework”, and “% of PM’s executed on time” will affect the overall performance of the maintenance process, which will result in a certain level of maintenance function performance. The Lagging Indicators in this case which are affected by these Leading Indicators are “Maintenance Cost as a % of budget” and “Plant Availability”.
At least one of these Lagging Indicators will suffer if there is sufficient underperformance in the Leading Indicators. In this following example you see the alignment of the maintenance process as KPIs transition from leading to lagging.
# |
Type of KPI |
Measuring |
Key Performance Indicator |
World Class Target Level |
1 |
Result/ Lagging |
Cost |
Maintenance Cost |
Context Specific |
2 |
Result/ Lagging |
Cost |
Maintenance Cost/ Replacement Asset Value of a Plant and Equipment |
2 - 3% |
3 |
Result/ Lagging |
Cost |
Maintenance Cost/ Manufacturing Cost |
< 10 - 15% |
4 |
Result/ Lagging |
Cost |
Maintenance Cost/ Unit Output |
Context Specific |
5 |
Result/ Lagging |
Cost |
Maintenance Cost/ Total Sales |
6 - 8% |
6 |
Result/ Lagging |
Failures |
Mean Time Between Failure (MTBF) |
Context Secific p |
7 |
Result/ Lagging |
Failures |
Failure Frequency |
Context Specific |
8 |
Result/ Lagging |
Downtime |
Unscheduled Maintenance Related Downtime (hours) |
Context Specific |
9 |
Result/ Lagging |
Downtime |
Scheduled Maintenance Related Downtime (hours) |
Context Specific |
10 |
Result/ Lagging |
Downtime |
Scheduled Maintenance Shutdown Overrun (hours) |
Context Specific |
11 |
Process/ Leading |
Maintenance Strategy |
Percentage of work requests remaining in "Request" status for less |
80% of all work requests should be processed in 5 days or less. |
12 |
Process/ Leading Planning Element/ Lagging |
Planning |
Percentage of work orders with man-hour estimates within 10% of actual over the specified time period. |
Accuracy of greater than 90% |
13 |
Process/ Leading |
Planning |
Percentage of work orders over the specified time period, with all planning fields completed. |
95%+ |
14 |
Process/ Leading |
Planning |
Percentage of work orders assigned "Rework" status (due to a need for additional planning) over the last month. |
This level should not exceed 2% to 3% |
15 |
Process/ Leading |
Planning |
Percentage of work orders in "New" or "Planning" status less than 5 days, over the last month. |
80% of all work orders should be possible to process in 5 days or less. Some work orders will require more time to plan but attention must be paid to `late finish date'. |
16 |
Process/ Leading Scheduling Element/ Lagging |
Scheduling |
Percentage of work orders over the specific time period, having a scheduled date earlier or equal to the late finish or required by date. |
9504+ should be expected in order to ensure the majority of the work orders are completed before their 'late finish date'. |
17 |
Process/ Leading |
Scheduling |
Percentage of scheduled available man-hours to total available man-hours over the specified time period. |
Target 80% of man-hours applied to scheduled work. |
18 |
Process/ Leading |
Scheduling |
Percentage of work orders assigned "Delay" status due to unavailability of manpower, equipment, space or services over the specified time period. |
This number should not exceed 3% to 5% |
19 |
Process/ Leading |
Execution |
Percentage of work orders completed during the schedule period before the late finish or required by date. |
Schedule compliance of 90%+ should be achieved. |
20 |
Process/ Leading Execution Element/ Lagging |
Execution |
Percentage of maintenance work orders requiring rework. |
Rework should be less than 3%. |
21 |
Process/ Leading |
Follow-up |
Percentage of work orders closed within 3 days, over the specified time period. |
Should achieve 95%+ Expectation is that work orders are reviewed and closed promptly. |
Figure 2 does not show the specific KPIs that would be used to manage the maintenance process. Instead I have listed some of the ones I prefer to use in the table below, along with the world class level, where applicable.
In the same way that we use KPIs in the maintenance function example, we can use them in other areas of the business. This approach is particularly interesting where multiple functional areas each play a role in a given goal, such as plant reliability. Plant reliability is a shared responsibility of the maintenance, production and engineering functions. Leading indicators for each departmental process would feed the lagging indicators for the department function, which would then summarize to the plant level as shown in Figure 3
The Problem
Most of the problem is management should learn to manage their operations through KPIs (both leading and lagging). In my 30 year plus career, I have seen many plants shut their doors forever. They blamed the closing on many reasons but the one thing they all had in common was that NONE had properly managed with the KPIs. The metrics or indicators they managed with were ones like:
- Cost
- Asset Availability
- Equipment downtime
- OEE
All of these measurements or indicators, while useful for measuring performance, cannot be used to manage the maintenance and reliability process. They are simply the results of all the actions that have taken place in the maintenance and reliability process. Again, you cannot manage results.
You can only manage the processes leading to the results. If your company uses any of the above metrics to manage their operation, without Leading Indicators they are in a reactive mode. Companies must ask themselves some very basic questions:
- Does your company differentiate between those KPIs which can be used to manage (Leading Indicators) from those that we can use to measure results (Lagging Indicators)?
- Does your company measure performance of the maintenance process, where they can easily manage when needed?
If Leading Indicators show underperformance, then the underperformance will affect the Lagging Indicator which could be reliability, cost, capacity, etc. People must understand the relationship between a Leading and Lagging Indicator and their affects on the maintenance and reliability function.
Most maintenance managers are told to control cost, improve reliability and increase asset availability with no idea where the problem may be in their maintenance process. Unfortunately many lose their job as a result. The fact is you cannot control cost, reliability, or availability without managing the maintenance process.
John Day, (retired) Alumax: Since 1999, Alumax has been a leader in all alloys of aluminum. Their Mt. Holly plant was rated as one of the best maintained plants in the world for over 20 years. John Day, the company’s former engineering and maintenance manager comments on how he managed using KPIs:
“Hundreds of companies visited our plant, paying $1000 each to see our maintenance program up close, but only a few learned from their visit.” John feels they missed out on how Alumax managed with aligned KPIs.
John was also invited to visit over 500 plants in the US, Canada, and Australia and says, “The one the thing over 90% of them had in common was they could not effectively manage their plants because they had no Leading KPIs in place. Many of these companies were crying for help but did know which way to go.” Most managed only with Lagging Indicators and made decisions based indicators such as cost and reliability.
John learned early in his career that without Leading KPIs you cannot to manage the maintenance and reliability of equipment. “For over 20 years, I could see problems brewing long before they would become a serious issue. Alumax had a system in place where we could measure everything in our maintenance process - from Leading Indicators such as the identification of potential failures through to the lagging financial results of all actions performed by maintenance.”
This separation of Leading and Lagging KPIs allowed him to make management decisions when Leading KPI underperformance was identified before cost and reliability (the Lagging Indicators) were impacted.
According to John, there is a simple reason why most companies don’t succeed: They don’t know what information needs to be collected. In 1979, John worked with Alumax’s accounting department to establish over 60 financial accounts just for maintenance.
These financial accounts were linked to leading KPIs in the maintenance process which provided information needed to manage proactively. In turn, these KPIs were linked to equipment performance – also Lagging Indicators.
Each of these Lagging KPIs had established benchmarks which measured if the maintenance process was in or out of control. This approach may sound complex, but once you have it in place, management can truly manage the reliability of plant equipment.
John shared with me 13 years of KPI data that was so impressive it would bring tears to any maintenance and reliability professional’s eyes. Describing the data, John stated, ”Everyone from a maintenance person to the plant manager had KPIs they looked at on a daily or weekly basis in order for make basic and immediate management decisions.
Each level in our organization utilized a small number of Lagging KPIs, along with a bigger number of Leading KPIs that were important to managing their part of the business.” In reviewing Alumax’s KPIs over a 13 year period, I found that their maintenance cost (a Lagging KPI) did not increase but was constant. Maintenance cost as a percentage of return on asset value held at around 3% for all of those years.
Equally impressive was that the controllable plant operating cost was very constant over this same time period. This Lagging Indicator data pointed to the obvious fact that the reliability of equipment directly correlates to operating cost.
By managing the maintenance and reliability process, element by element using Leading Indicators, Alumax was able achieve these results. John’s experience validates that managing with both Leading and Lagging KPIs is the only way to effectively manage an operation in order to achieve the results expected to succeed in a business.
By the way, over 26 years ago I was blessed to work for John Day at Alumax and enjoyed every day I worked for him. The solution How much money do corporations lose every year due to plants not managing with good leading and lagging KPIs?
The costs may be too high to calculate, so we must stop these massive loses now by putting a plan in place to develop and align KPIs. This section may save your plant or your job. But I warn you, don’t look for short cuts in the process I am about to explain because there are none.
Step 1: Educate management, from executive level to floor level supervisors, on KPIs and how they Leading and Lagging Indicators should be aligned to meet the business goals. You then must provide a similar education to the maintainers and operators.
Step 2: Define and assess your current maintenance and reliability process against a future state. A future state is known maintenance and reliability “best practices”. As part of this assessment, you must develop a business case with financial opportunities and cost of change. This step continues the education process and creates an awareness of the opportunity at hand.
Step 3: Develop a plan based on the assessment to include financial opportunities and cost on a time line. This plan must include:
- The definition of the elements of your maintenance and reliability process (work identification, planning, scheduling, work execution, etc.)
- Workflow Process for each element in your maintenance and reliability process
- The definition of roles and responsibilities for each task
- The definition of Leading and Lagging KPIs in each element of your maintenance and reliability process in each element
- Targets and World class benchmarks that are established against the defined KPIs.
Step 4: Implement the process and begin managing based on Leading Indicators. I would begin measuring only a few KPIs at first (maximum of 5). Then allow people at the lowest levels to make the decisions required ensuring your maintenance and reliability process is proactive and effective. The use of Leading KPIs is a great awareness tool and will bring everyone into the decision-making process.
This process is not easy however it is not magic either. Developing KPIs is a time consuming process but one which must be done in order for a company to survive.