Contingency Planning For Lean Organizations – Part IV – Crisis Management
In a previous post we eluded that lean organizations are likely to be more susceptible to disruptions or adverse conditions and may even have a greater impact on the business. To some degree this may be true, however, in reality, Lean has positioned these organizations to be more agile and extremely responsive to crisis situations to mitigate losses.
True lean organizations have learned to manage change as normal course of operation. A crisis only presents a disruption of larger scale. Chapter 10 of Steven J. Spear’s book, “Chasing the Rabbit”, exemplifies how high velocity, or lean, organizations have managed to overcome significant crisis situations that would typically cripple most organizations.
Problem solving is intrinsic at all levels of a lean organization and, in the case of Toyota, problem solving skills extend beyond the walls of the organization itself. It is clear that an infrastructure of people having well developed problem solving skills is a key component to managing the unexpected. The events presented in this chapter demonstrate the agility that is present in a lean organization, namely Toyota in this case and it’s supplier base.
Training is a Contingency
Toyota has clearly been the leader in Lean manufacturing and even more so in developing problem solving skills at all levels of the organization company-wide. The primary reason for this is the investment that Toyota puts into the development of people and their problem solving skills at the onset of their employment with the company. The ability to see problems, correct them in real time, and share the results (company-wide) is a testament to the system and it’s effectiveness has been proven on many occassions.
Prevention, preparation, and training (which is also a form of prevention) are as much an integral part of contingency planning as are the actual steps that must be executed when a crisis situation occurs. Toyota has developed a rapid response reflex that is inherent in the organization’s infrastructure to rapidly regain it’s capabilities when a crisis strikes.
We highly recommend reading Steven J. Spear’s “Chasing the Rabbit” to learn and appreciate the four capabilities that distinguish “High Velocity” organizations. The key to lean is creating a cultural climate that is driven by the relentless pursuit of improvement and elimination of waste. Learning to recognize waste and correcting the condition as it occurs requires keen observation and sharp problem solving skills.
Creating a culture of this nature is an evolutionary process – not revolutionary. In many ways the simplicity of the four capabilities is it’s greatest ally. Instilling these principles and capabilities into the organization demands time and effort, but the results are well worth it. Lean was not intended to be complex and the principles demonstrated and exemplified in Chasing the Rabbit confirm this to be true. This is not to be construed as saying that the challenges are easy … but with the right team they are certainly easier.
Coincidentally, we are having a first hand experience with the Blue Screen of Death or BSOD with one of our laptops today. The completely unexpected critical system error that renders Windows completely helpless. If this isn’t on your list of IT concerns, it should be.
In our case the error appears to be video related – driver or card. Most IT specialists know how to deal with these types of errors but for the average user, the message that appears is enough to make you sweat. If the system can’t fix the error, you may very well end up staring at a Black Screen – just as we are.
How is it that we were still able to produce this POST? Well, we are currently executing our contingency plan and using another system that is operated independently. Most companies back up their data to prevent or minimize loss. Another concern that is often overlooked is accessibility to that back up data in the event the system goes down.
What have we learned?
We are not the first to experience this problem. We did a Google search using some brief terms such as “Computer Black Screen”, “Laptop Black Screen”, and we even Googled parts of the error message that appeared on the screen. The result? Thousands of people have experienced this same error.
The point of this post is to demonstrate that you do not have to re-invent the wheel to determine potential solutions or to discover problems that may occur. Quite likely, they may already have happened and solutions are already developed and available.
There are two probable solutions to our video issue:
Update the video device driver (Free)
Replace the video card (Cost $)
Hopefully, the first solution is the answer to our problem. Video cards are not sitting on our shelf and the downtime may be extended if we can’t find something locally.
It is noteworthy that we have not yet identified the root cause of this failure. We haven’t loaded any new software or experienced problems in recent history. This may be the topic for a future problem solving post.
Regardless of the outcome of our present dilemma, we have learned that it is a good idea to keep device drivers up to date. As a planned activity, this may prevent some of you from having to experience the BSOD as we have today.
The loss incurred for this event is more than just the cost to repair. This computer may be down for a few days. How much is the down time worth? Unless we play out the scenarios that may threaten or pose a risk to our business, we may never have the opportunity to prepare for the event until it actually happens.
Keep an open mind and use the resources available to you to help solve the problem. In some cases a simple Google search could confirm your concern in a matter of seconds.
In its simplest form, availability measures the uptime of a machine or process against the planned production time. As one of the factors of Overall Equipment Efficiency (OEE), Availability is expressed as a percentage. The uptime is calculated by taking the difference between the planned production time and total duration of the downtime events that occurred during the planned production period.
We specifically address the “Availability” factor in this post for the simple reason that the definition of availability is likely to be one of the most debated and hotly contested topics of your OEE implementation strategy. The reason for this, in many cases, is the lack of clarity in some of the most basic terminology. The purpose of this discussion is to present some topics for consideration that will allow you to arrive at a clear definition that can perhaps be formed into a standard policy statement.
We will also demonstrate that it is possible to calculate the downtime by simply knowing the cycle time or process rate, the quantity of parts produced, and the planned production time. We recommend using this technique to validate or reconcile the actual documented downtime. We would argue that the first and foremost purpose of any machine monitoring or downtime event measurement system is to determine the “WHY and WHAT” of the downtime events and secondly to record the “When and How Long”.
You will learn that monitoring your processes to determine causes and duration of downtime events is key to developing effective action plans to improve availability. The objective of any machine automation, sensor strategy, or data collection and analysis is to determine methods and actions that will improve the availability of the equipment through permanent corrective actions, implementing more effective trouble shooting strategies (sensor technologies), improved core process controls, or more effective preventive maintenance.
Define the purpose of OEE
While it looks like we’re taking a step back from the topic of discussion, bear with us for just a paragraph or two. A clear statement of purpose is the best place to start before executing your OEE implementation strategy:
To identify opportunities to improve the effectiveness of the company’s assets.
You will quickly realize that, when attempting to define the measurement criteria for the OEE factors, in particular Availability, your team may present rationale to exclude certain elements from the measurement process. These rationalizations are typically predicated on existing policy or perceived constraints that simply cannot be changed. People or teams do not want to be penalized for items that are “out of their control” or bound by current policy. Continuous improvement is impeded by attempts to rationalize poor performance.
We understand that some of these “exclusions” present a greater challenge, however, we do not agree with the premise that they cannot be improved. Again, it is a matter of “purpose”. Limiting the scope of measurement will limit the scope of improvement. Now it’s time to explore what could be the foundation for a sound definition of availability.
It may seem reasonable to assume that, at a minimum, the only planned down time events that should be excluded from the availability factor are planned preventive maintenance activities, mandatory break periods, and scheduled “down” time due to lack of work. We would argue and agree that the only justification for an idle machine is “Lack of Work”.
What would be the reason to settle for anything less? If Preventive Maintenance is critical to sustaining the performance of your process, doesn’t it make sense to consider it in the measurement process? The rationale that typically follows is that Preventive Maintenance must be done and it’s really out of our control – it is a planned event. We would argue that the time to complete Preventive Maintenance can be improved.
Is it possible that the Mean Time Before Failure or Required Maintenance can be extended? Is it possible to improve materials, components, or lubricants that could extend the process up time? Is it possible to improve the time it actually takes to perform the required maintenance? If so, what is the measure that will be used to show that additional capacity is available for production.
If set up times for die changes or tool changes can be improved from hours to minutes, could the same effort and devotion to improve Preventive Maintenance techniques yield similar results? We think so.
One example is the use of synthetic oils and lubricants that have been proven to significantly extend the life of tools and components and also reduces the number changes required over the service life of the machine. Quick change features that can assist with easy and ready access to service points on tooling and machines can also be implemented to reduce preventive maintenance times.
The other exclusion that is often argued is break times. Labour laws require you to provide break times for your employees. However, since automated processes are not subject to “Labour Laws”, the “mandatory break times” do not apply. We would argue that methods should be pursued to reduce the need for human intervention and look for ways to keep the machine running. Is it possible to automate some of the current processes or rotate people to keep the machine running?
Aside from this more obvious example, consider other organizational policies that may impact how your organization runs:
Shift start-up meetings
Employee Communication Meetings
End of Shift clean up periods
Quality first off approval process
Shift first off versus Run first off
Weld Tip changes – PM or Process Driven
What is the purpose of the shift start-up meeting? What is the purpose of the monthly employee communication meeting? Could this information be conveyed in a different form? What length of time is really required to convey the message to be shared? Is the duration of the meeting actually measured or do you resort to the standard time allotted?
Clean up periods at the end of the shift are also a common practice in many plants. What is being cleaned up? Why? Is it possible to maintain an orderly workplace during the shift – clean up as it happens in real-time? Again, do you record the actual clean up time or do you just enter the default clean up time allotted?
How much time is lost to verify the integrity of the product before allowing production to commence? What process parameters or factors would jeopardize the quality of the product being produced? No one wants to make scrap or substandard components, however, the challenge remains to determine what factors influence the level of quality. If it is possible to determine what factors are critical to success in advance, then perhaps the quality verification process becomes a concurrent event.
There are other factors that can impact availability including, but certainly not limited to, personnel (illness, inclement weather), material availability, other linked processes (feeder / customer), material changes, tool changes, quality concerns, and unexpected process, equipment, or machine faults.
It is possible to use manual or automated systems to collect various machine or process codes to record or document the duration and type of downtime event. We recommend and support the use of automated data collection systems, however, they should be implemented in moderation. One of the primary impediments to success is overwhelming volumes of data that no one has the time to analyze.
The Goal = 100% Up Time = ZERO Down Time = Zero Lost Time = Zero Defects = 100% Availability
The goal is to use the data and tools available to either permanently resolve the problem by implementing an effective corrective action or to assist the trouble shooting process by identifying the failure mode and to minimize the duration of the downtime event.
We have witnessed data collection strategies where an incredible number of sensors were installed to “catch” problems as they occur. The reality was the sensors themselves became the greater cause of downtime due to wear or premature failure due to improper sensor selection for the application. Be careful and choose wisely.
When used correctly, automation can be a very effective tool to capture downtime events and maintain the integrity of the overall measurement process. With the right tools, trouble shooting your process will minimize the duration of the down time event. Monitoring the frequency of these events will also allow you to focus your attention on real opportunities and circumvent nuisance faults.
The objective of collecting the “downtime event” history is to determine what opportunities are available to improve uptime.
Duration versus Frequency
The frequency of a downtime event is often overlooked as most of the attention is devoted to high duration downtime events. Some sources suggest that short duration downtime events (perhaps as little as 30 seconds) are not worth measuring. These undocumented losses are reflected, or more accurately hidden, by a corresponding reduction in the performance factor.
Be careful when setting what appears to be simple policy to document downtime. A 20 second downtime event that occurs 4 times per hour could quickly turn into 10 minutes a shift, 30 minutes a day, 2.5 hours a week, 125 hours a year. Rather than recording every event in detail, we recommend implementing a simple “tick” sheet to gain an appreciation for the frequency of failures. Any repetitive events can be studies and reviewed for corrective action.
Verify the Downtime
One of the advantages of OEE is that it is possible to reconcile the total time – OEE should never be greater than 100%. Of course this statement requires that the standard cycle time is correct and the total quantity of parts produced is accurate. So, although all of the downtime events may not be recorded, it is very easy to determine how much downtime occurred. This will help to determine how effectively downtime data is being recorded.
A perfect example to demonstrate this comes from the metal stamping industry. Progressive dies are used to produce steel parts from coil steel. The presses typically run at a fixed “predetermined” optimum run rate. Depending on the type of part and press, progressive dies are capable running at speeds from as low as 10 strokes per minute up to speeds over 300 strokes per minute.
For ease of calculation, assume we have a press that was scheduled to run a part over an 8 hour shift having two 10 minute breaks. The standard shift hours are 6:45 am – 3:15 pm and 3:30 pm – 12:00 am. The company provides a 30 minute unpaid meal break after 4 hours of work. The optimum press speed to run the part is 20 strokes per minute (spm). If a total of 6200 parts were made – how much downtime was incurred at the press?
To determine the press time required (also known as earned time), we simply divide the quantity of parts produced by the press rate as follows:
Machine Uptime: 6200 / 20 = 310 minutes
Our planned production time was 8 hours or 480 minutes. Assuming that company policy excludes break times, the net available time to run the press is 480 – (2 x 10) = 460 minutes.
Availability = Earned Time / Net Available Time = 310 / 460 = 67.39%
We can see from the above example that it easy to determine what the downtime should have been and, in turn, we could calculate the availability factor. This calculation is based on the assumption that the machine is running at the stated rate.
The Availability TWIST (1):
Knowing that press and die protection technologies exist to allow presses to run in full automatic mode, the two break periods from our example above do not apply to the equipment, unless company policy states that all machines or processes must cease operations during break periods.
Assuming that this is not the case, the press is available for the entire shift of 480 minutes. Therefore, the availability calculations from above would be:
Availability = Earned Time / Net Available Time = 310 / 480 = 64.58%
The Availability TWIST (2):
Just to expand on this concept just a little further. We also indicated that the company provided an unpaid lunch period of 30 minutes. Since meal breaks don’t apply to presses, the reality is that the press was also available to run during this period of time. The recalculated downtime and availability are:
Availability = Earned Time / Net Available Time = 310 / 510 = 60.78%
The Availability TWIST (3):
Finally, one last twist (we could go on). We deliberately indicated that there was a 15 minute break between shifts. Again, is there a reason for this? Does the machine have to stop? Why?
Availability – NEXT Steps
As you begin to look at your operations and policies, start by asking WHY do we do this or that? The example provided above indicates that a significant delta can exist in availability (close to 7%) although the number of parts produced has not changed. The differing results are related to policy, operating standard, or both.
If the performance (cycle time or production rate) and total quantity of parts produced data have integrity, the availability factor can be reconciled to determine the integrity of the downtime “data collection” system. From this example it should also be clear that the task of the data collection system is to capture the downtime history as accurately as possible to determine the opportunities to improve availability NOT just to determine how much downtime occurred.
This example also demonstrates why effective problem solving skills are critical to the success of your lean implementation strategy and is also one of the reasons why programs such as six sigma and lean have become integrated as parallel components of many lean execution strategies.
The Goal: 100% uptime / Zero downtime / Zero lost time /100% availability
Regardless of the measurement baseline used, be consistent. Exclusions are not the issue, it is a matter of understanding what is involved in the measurement process. For example, maintenance activities performed during break periods may be a good management practice to improve labour efficiencies, however, the fact that the work was performed during a break period should not exclude it from the “downtime” event history. We would argue that all activities requiring “equipment time” or “process time” should be recorded.
We are currently offering our Excel OEE Spreadsheet Templates and example files at no charge. You can download our files from the ORANGE BOX on the sidebar titled “FREE DOWNLOADS” or click on the FREE Downloads Page. These files can be used as is and can be easily modified to suit many different manufacturing processes. There are no hidden files, formulas, or macros and no obligations for the services provided here.
Overall Equipment Efficiency (OEE): Standardized Work
After you start collecting OEE data for your processes, you may notice significant variance between departments, shifts, and even employees performing the work. Of the many aspects that you will be inclined to investigate, standardized work should be one of them.
Making sure that all employees are executing a process or sequence of processes correctly and exactly the same way every time is the topic of standardized work. The OEE data may also direct you to review how the processes are being executed by some of the top performers to determine if they are truly demonstrating best practices or simply cutting corners.
Lean practices are founded on learning by observing. We cannot stress the importance of observing an operation to see first hand what opportunities for improvement (waste elimination) are available. OEE data is a compass that directs you where to look; however, the destination for improvements is the process, the very source from where the data originated.
Establishing Standard Cycle Times
One of the first questions we usually ask is, “How were the standard cycle times determined?” Was the standard based on best practices, quoted rates, time studies, name plate ratings, or published machine cycle times?
We recommend conducting an actual time study using a stop watch and calculating part to part (button to button) cycle times accordingly. We have used the stop watch capability of the BlackBerry many times. Results for lap times and total elapsed time are easily recorded and can be e-mailed as soon as the study is complete.
The sample size of course will depend on the actual rate of the machine and should be statistically relevant. One or two cycles is not sufficient for an effective time study.
For operator “controlled” processes, we recommend involving the employees who normally perform the work when conducting the time study. It doesn’t make sense to have the “office experts” run the equipment for a short burst to set a rate that cannot be sustained or is just simply unreasonable.
Many processes, those dependent on human effort or automation, are usually controlled by PLC’s that are also capable of providing the machine cycle time. At a minimum, we recommend validating these cycle times to at least satisfy yourself that these are part to part or “button to button” cycle times.
For automated operations, PLC’s can typically be relied upon to provide a reasonable cycle time. Without going to far into process design and development, you will need to understand the elements that control the process sequences. Some processes are driven by time controls (an event occurs after a period predetermined period of time) versus those that may be event-driven (an event occurs based on satisfying a dependent “sensor on-off” condition or similar “event signal” mechanism.
The real key to understanding the process being studied is to develop a flow chart clearly defining each of the process steps. It is of equal importance to observe the differences that may be occurring between employees performing the work. Either the instructions lack clarity or habits (good or bad) have been developed over time. Although templates exist to aid in the development of standardized work, don’t wait to find the right tool.
Using Video – Record it Live
We highly recommend using a video recorder to capture the process in action. With the technology available today, video is readily available and a very cost-effective method of documenting your processes. Video presents several advantages:
Captures activities in real-time.
Provides instant replay.
Establish process or sequence event timing in real-time.
Eliminates need for “stop watches” to capture multiple event timing.
Can be used as a training aid for new employees to demonstrate “standardized work practices”.
Can be used to develop “best practices”.
Reduces or minimizes potential for time measurement error.
We have successfully used video to not only develop standardized work for production processes, but also for documenting and recording best practices for tool changes, set up, and checking or inspection procedures.
Standardized work eliminates any questions regarding the proper or correct way of performing the work required. Standardized work procedures allow additional development work to be completed “offline” without further disruption to the production process.
Of course, Standardized Operating / Work procedures are required to establish effective and meaningful value stream maps but even more importantly, they become an effective tool to understand the opportunity for variances in your OEE data, certainly where manual or “human” controlled operations are concerned.
It has been argued that OEE data in and of itself is not statistically relevant and we are inclined to agree with this statement. The simple reason is that the processes being measured are subject to significant internal and external variances or influences. Examples may include reduced volumes, product mix changes, tool change frequency, employee turnover, and economic conditions.
As mentioned in many of our posts, it is important to understand “WHAT and WHY” we are measuring. Understanding the results is more important than the result itself. A company looking to increase inventory turns may resort to smaller production runs and more frequent tool changes. This will reduce Availability and, in turn, will result in a lower OEE. The objective may then be to find a way to further reduce tool change times to “improve” the Availability.
The use of OEE data can vary in scope, ranging from part specific performance to plant wide operations. As the scope of measurement changes, so do the influences that impact the net result. So once again, we urge you to use caution when comparing data between personnel, shifts, departments, and production facilities. Typically, first or day shift operations have greater access to resources that are not available on the “off’ shifts.
Perhaps the greatest “external” influence on current manufacturing operations is the rapid collapse of the automotive industry in the midst of our current economic “melt down”. The changes in operating strategy to respond to this new crisis are bound to have an effect on OEE among other business metrics.
The ultimate purpose of Lean practices is to reduce or eliminate waste and doing so requires a rigorous “document and review” process . The ability to show evidence of current versus proposed practices will reduce or eliminate the roadblocks that may impede your continuous improvement objectives.
While the post is brief today, hopefully the message is helpful.