How many times have you, or someone you know, challenged the measurement process or method used to collect the data because the numbers just “don’t make sense” or “can’t be right”?
It is imperative to have integrity in the data collection process to minimize the effect of phantom improvements through measurement method changes. Switching from a manual recording system to a completely automated system is a simple example of a data collection method change that will most certainly generate “different” results.
Every measurement system is subject to error including those used to measure and monitor OEE. We briefly discussed the concept of variance with respect to actual process throughput and, as you may expect from this post, variance also applies to the measurement system.
Process and measurement stability are intertwined. A reliable data collection / measurement system is required to establish an effective baseline from which to base your OEE improvement efforts. We have observed very unstable processes with extreme throughput rates from one shift to the next. We learned that the variance in many cases is not always the process but in the measurement system itself.
We decided to comment briefly on this phenomenon of measurement error for several reasons:
- The reporting systems will naturally improve as more attention is given to the data they generate.
- Manual data collection and reporting systems are prone to errors in both recording and data input.
- Automated data collection systems substantially reduce the risk of errors and improve data accuracy.
- Changes in OEE trends may be attributed to data collection technology not real process changes.
Consider the following:
- A person records the time of the down time and reset / start up events by reading a clock on the wall.
- A person records the time of the down time event using a wrist watch and then records the reset /start up time using the clock on the wall.
- A person uses a stop watch to track the duration of a down time event.
- Down time and up time event data are collected and retrieved from a fully automated system that instantly records events in real time.
Clearly, each of the above data collection methods will present varying degrees of “error” that will influence the accuracy of the resulting OEE. The potential measurement error should be a consideration when attempting to quantify improvement efforts.
Measurement and Error Resolution
The technology used will certainly drive the degree of error you may expect to see. A clock on the wall may yield an error of +/- 1 minute per event versus an automated system that may yield an error of +/- 0.01 seconds.
The resolution of the measurement system becomes even more relevant when we consider the duration of the “event”. Consider the effect of measurement resolution and potential error for a down time event having a duration of 5 minutes versus 60 minutes.
A classic fallacy is “inferred accuracy” as demonstrated by the stop watch measurement method. Times may be recorded to 1/100th of a second suggesting a high degree of precision in the measurement. Meanwhile, it may take the operator 10 seconds to locate the stop watch, 15 seconds to reset a machine fault, and 20 seconds to document the event on a “report” and another 10 seconds to return the stop watch to its proper location.
What are we missing? How significant is the event and was it worth even recording? What if one operator records the “duration” after the machine is reset while another operator records the “duration” after documenting and returning the watch to its proper location?
The above example demands that we also consider the event type: “high frequency-short duration” versus “low frequency-long duration” events. Both must be considered when attempting to understand the results.
The EVENT is the Opportunity
As mentioned in previous posts, we need to understand what we are measuring and why. The “event” and methods to avoid recurrence must be the focus of the improvement effort. The cumulative duration of an event will help to focus efforts and prioritize the opportunities for improvement.
Additional metrics to help “understand” various process events include Mean Response Time, Mean Time Between Failures (MTBF), and Mean Time To Repair (MTTR). Even 911 calls are monitored from the time the call is received. The response time is as critical, if not more so, than the actual event, especially when the condition is life-threatening or otherwise self-destructive (fire, meltdown).
An interesting metric is the ratio between Response Time and Mean Time To Repair. The response time is measured from the time the event occurs to the time “help” arrives. Our experience suggests that significant improvements can be made simply by reducing the response time.
We recommend training and providing employees with the skills needed to be able to respond to “events” in real time. Waiting 15 minutes for a technician to arrive to reset a machine fault that required only 10 seconds to resolve is clearly an opportunity.
Many facilities actually hire “semi-skilled” labour or “skilled technicians” to operate machines. They are typically flexible, adaptable, present a strong aptitude for continual improvement, and readily trained to resolve process events in real time.
Measurement systems of any kind are prone to error. While it is important to understand the significance of measurement error, it should not be the “primary” focus. We recommend PREVENTION and ELIMINATION of events that impede the ability to produce a quality product at rate.
Regrettably, some companies are more interested in collecting “accurate” data than making real improvements (measuring for measurements sake).
WHAT are you measuring and WHY? Do you measure what you can’t control? We will leave you with these few points to ponder.
Until next time – STAY Lean!