1. Introduction: The Challenge of Industrial Reliability
The operational reliability of industrial equipment is an essential pillar for productivity and safety in manufacturing environments. Unexpected failures result in production stoppages, high repair costs, worker safety risks and environmental impacts. Root Cause Analysis (RCA), a systematic methodology for identifying the primary causes of failures or problems, is a critical tool for maintenance and reliability engineers. Instead of just remedying symptoms, RCA seeks to eliminate the source of the problem, preventing its recurrence and promoting continuous improvements. This technical article explores and compares three fundamental RCA methods: the 5 Whys, the Ishikawa (Fishbone) Diagram and the Fault Tree Analysis (AAF), offering a practical guide for their application in the context of Brazilian manufacturing, with an emphasis on ABNT and NR standards.
2. Fundamental Principles of Root Cause Analysis
2.1. Definition and Objectives
Root Cause Analysis (RCA) is an investigative process that aims to identify the fundamental triggering factor of a problem or event. A root cause is defined as a factor that, if removed or corrected, prevents the unwanted event from recurring. Its objectives include:
- Identify the true source of the problem, not just its symptoms.
- Develop effective corrective actions to prevent recurrence.
- Improve safety, reliability and operational efficiency.
- Reduce costs associated with failures and rework.
The core philosophy of RCA is that most problems are the result of a set of conditions, rather than a single isolated failure. The depth of the analysis is directly proportional to the complexity and impact of the problem.
2.2. The Causal Logic
Causal analysis involves understanding the sequence of events that culminate in failure. Each event is an effect of a previous cause, and RCA seeks to trace this chain to the point where an intervention can be implemented. The principles include:
- Effect-Cause: Start with the undesired effect and work backwards, asking 'why did this happen?'.
- Verification: Each proposed cause must be verifiable and directly linked to the subsequent effect.
- Intervention: The root cause must be a point where management has control and can implement a change.
3. Technical Specifications and Applicable Standards
3.1. Risk Management and Maintenance Standards
The application of RCA is intrinsically linked to quality and maintenance management systems. In Brazil, several rules and regulations establish guidelines for asset management and occupational safety:
- ABNT NBR ISO 31000:2018 - Risk Management: Provides generic principles and guidelines for risk management, including the identification and analysis of causes of events. Although it does not prescribe a specific RCA method, its principles support the need to understand the origin of risks.
- ABNT NBR ISO 55001:2017 - Asset Management: Specifies the requirements for an asset management system, emphasizing the importance of maintenance practices to optimize the asset life cycle and minimize operational risks, where RCA is a support tool.
- ABNT NBR ISO 19011:2018 - Guidelines for Auditing Management Systems: Assists in evaluating the effectiveness of implemented RCA processes, ensuring that non-conformities are properly investigated.
3.2. Relevant Brazilian Certifications
Safety and compliance are critical aspects in Brazilian manufacturing, directly influencing the need for RCA:
- NR-10 - Safety in Electrical Installations and Services: Requires risk analysis and work procedures which, in the event of accidents, require an in-depth investigation to identify the root cause and prevent recurrences.
- NR-12 - Safety at Work in Machines and Equipment: Imposes strict requirements to ensure the safety of machines and equipment. Failures resulting in accidents or near misses must be investigated with RCA to ensure compliance.
- INMETRO: Although not an RCA standard, the certification of products and systems by INMETRO aims to ensure compliance with safety and performance standards. Problems in certified products may require RCA to identify flaws in the manufacturing or design process.
4. RCA Method Selection and Sizing Guide
Choosing the appropriate RCA method depends on the complexity of the problem, available resources, and the potential impact of the failure. There is no universally superior method; effectiveness lies in the correct application to the specific situation. The following table guides selection:
| Situation | Suggested Method | Depth Level | Application Time (Estimated) | Application Example |
|---|---|---|---|---|
| Simple Operational Failures (Ex: machine stopped) | 5 Whys | Superficial to Medium | 15-60 minutes | Conveyor belt motor turns off by circuit breaker. |
| Quality Problems, Process Deviations | Ishikawa (Fishbone) Diagram | Medium to High | 1-4 hours (team session) | Batch of products with dimensions outside the tolerance (ex: ±0.05 mm). |
| Serious Accidents, Critical Safety Failures | Fault Tree Analysis (AAF) | High (Quantitative) | Several days to weeks | Unexpected tripping of protection system, leakage of dangerous substance. |
| Recurring Problems with Multiple Causes | 5 Whys + Ishikawa (combined approach) | Medium to High | 1-2 days | Repetitive failure of bearings in an industrial pump. |
For failures in mechanical components, such as a UNITEC-D bearing, initial analysis may indicate excessive vibration (15 mm/s peak, above the ISO 10816-3 limit of 4.5 mm/s for large machines). The 5 Whys may reveal that the vibration is due to misalignment, caused by an error in assembly, which in turn was due to a lack of training. For an electrical failure, such as a short circuit in a panel operating at 380 V, AAF can quantify the probability of an ignition event resulting from the failure of a contactor and the absence of residual protection, considering an MTBF of 100,000 hours for the contactor and a cost of 15,000 BRL per production stoppage.
5. RCA Good Practices for Implementation and Commissioning
The effectiveness of RCA depends on structured and disciplined implementation:
5.1. Team Building and Commitment
- Multidisciplinary Team: Form a group with operators, maintenance technicians, process, quality and safety engineers. The diversity of perspectives enriches the analysis.
- Training: Ensure that all team members are trained in the selected RCA methods, depending on the level of complexity. Practical training is essential.
- Leadership Support: Management commitment is essential to allocate resources, time and implement the proposed solutions.
5.2. Rigorous Data Collection and Evidence
- Immediacy: Start data collection as soon as possible after the event, preserving evidence (photos, videos, material samples).
- Objectivity: Focus on facts, not opinions. Document environmental conditions (temperature 30°C, humidity 85%), instrument readings (pressure 5 bar, flow rate 20 L/min), maintenance history and machine logs.
- Various Sources: Interviews with personnel involved, review of technical manuals (e.g. 120 Nm torque specifications for fastening screws), technical drawings, data from sensors and SCADA systems.
5.3. Validation of the Root Cause and Implementation of Actions
- Logic Test: The identified root cause must be able to explain all observed effects and, if removed, prevent recurrence.
- Action Plan: Develop a clear action plan, with responsibilities, deadlines and success indicators. For example, replacing a component with a more resistant material (e.g. AISI 316 stainless steel instead of AISI 304 for corrosive environments with pH < 6) or installing a temperature sensor with an alarm at 90°C.
- Monitoring: Monitor the effectiveness of the actions implemented and verify whether the problem has actually been eliminated. Periodic audits can be carried out, according to ABNT NBR ISO 19011.
6. Failure Modes and Root Cause Analysis
Understanding failure modes is crucial to targeting RCA:
6.1. Failures in Mechanical Components
- Fatigue: Common cause in shafts and gears. Investigate load cycles, stresses (fatigue limit of SAE 1045 steel is ~310 MPa), alignment, balancing and vibration (amplitude of 0.5 mm/s RMS may indicate pre-failure).
- Abrasive/Corrosive Wear: In pumps and valves. Analyze the quality of the fluid, presence of particles (maximum size of 10 micrometers for hydraulic systems), pH (acidity or alkalinity), temperature (above 60°C for oils can accelerate degradation) and lubrication.
- Fracture: Overload or impact. Evaluate part geometry, material properties, load history (maximum nominal load of 5000 N, actual load of 6500 N), and operating conditions.
6.2. Failures in Electrical Systems
- Short circuit/Overload: In motors and transformers. Inspect insulation (minimum resistance of 1 MΩ), cable sizing (e.g. 10 mm² cable for 40 A), protection (50 A circuit breakers), harmonics and energy quality (minimum power factor of 0.92 required by ANEEL).
- Degradation of Electronic Components: In inverters and PLCs. Analyze ambient temperature (maximum operation of 50°C), humidity, voltage peaks and expected lifespan (MTBF of 200,000 hours for an industrial PLC).
- Contact Failures: In contactors and relays. Check corrosion, mechanical wear, control voltage (24 Vdc, ±5%) and operating frequency.
7. Predictive Maintenance and Condition Monitoring
Predictive maintenance, through condition monitoring, complements RCA by identifying deviations before they result in catastrophic failures. Techniques include:
- Vibration Analysis: Detection of misalignment, play, unbalance and bearing problems. A 200% increase in vibration amplitude at a specific frequency is a critical indicator.
- Thermography: Identification of hot spots in electrical panels, motors and bearings, indicating overload or friction. Temperature differences of 10°C compared to similar components or the environment are alarming.
- Oil Analysis: Monitoring of contaminants (iron, chromium, copper in ppm), viscosity (ex: ISO VG 68), oxidation level and acidity to assess the health of hydraulic and lubrication systems. An increase of 50 ppm of iron may indicate wear.
- Ultrasonic Analysis: Detection of leaks in pipes and pneumatic systems, as well as problems in steam and compressed air traps (cost of leaking compressed air: 2,500 BRL/year per small hole).
These techniques provide data so that RCA can be proactive, investigating failure trends before they occur, and contribute to lifecycle optimization of components such as those provided by UNITEC-D.
8. Comparative Matrix of Root Cause Analysis Methods
The choice of RCA method should be informed by a clear understanding of its characteristics. The table below details a comparison of the main methods:
| Method | Central Philosophy | Advantages | Disadvantages | Typical Applications | Required Resources |
|---|---|---|---|---|---|
| 5 Whys | Iterative questions to identify the root cause of the problem. | Simple, fast, does not require intensive training. Applicable by any operational team. | It may be superficial, failing to identify complex or systemic causes. It depends on the experience of the facilitator. | Daily operational failures, simple process problems, occasional quality deviations. | Little: paper, pen, team of 2-3 people. |
| Ishikawa Diagram (Fishbone) | View categories of potential causes (Labor, Machine, Material, Method, Environment, Measurement). | Structured, visual, encourages team brainstorming. Helps organize multiple potential causes. | Qualitative, requires an experienced facilitator to guide the discussion. It can generate many potential causes without prioritization. | Complex quality issues, multi-factor equipment failures, process optimization. | Medium: Whiteboard, markers, team of 4-6 people. |
| Fault Tree Analysis (AAF) | Graphical deductive method that identifies logical combinations of events that lead to a top failure. | Quantitative, rigorous, identifies critical safety events and probabilities. Ideal for risk analysis. | Complex, time-consuming, requires specific software and advanced training in Boolean logic and probability. | Critical safety systems (NR-10, NR-12), nuclear, aerospace, high-risk chemical processes. | High: Software (e.g. Reliability Workbench), expert engineers, component reliability data (MTBF, MTTR). |
| Pareto Analysis | Based on the 80/20 principle, it focuses on the most frequent causes for prioritizing actions. | Identifies priorities in a clear way that is easy to understand and implement. Helps direct resources to the most impactful problems. | It does not identify the "deep" root cause of the problem, only its frequency. Requires historical failure data. | Problem prioritization, quality control, identification of repetitive failures in large data sets. | Medium: Historical failure data, spreadsheet software. |
9. Conclusion
The systematic implementation of Root Cause Analysis is a strategic imperative for Brazilian manufacturing, ensuring operational sustainability and worker safety. The choice between the 5 Whys, Ishikawa Diagram, Fault Tree Analysis, or a combination of these methods should be guided by the nature of the problem and available resources. By strictly applying ABNT guidelines and Regulatory Standards, companies can transform failures into opportunities for continuous improvement, increasing reliability levels and reducing losses. UNITEC-D GmbH, with its portfolio of high-quality industrial components such as certified bearings and precision sealing systems, is a strategic partner for companies seeking to prevent the recurrence of failures and optimize maintenance.
To ensure the availability of reliable industrial components and prevent the recurrence of failures, consult the UNITEC-D electronic catalog: https://www.unitecd.com/e-catalog/
10. References
- ABNT NBR ISO 31000:2018 - Risk Management. Brazilian Association of Technical Standards.
- ABNT NBR ISO 55001:2017 - Asset Management - Management systems - Requirements. Brazilian Association of Technical Standards.
- ABNT NBR ISO 19011:2018 - Guidelines for auditing management systems. Brazilian Association of Technical Standards.
- Brazil. Ministry of Labor and Employment. Regulatory Standard No. 10 (NR-10): Safety in Electrical Installations and Services.
- Brazil. Ministry of Labor and Employment. Regulatory Standard No. 12 (NR-12): Workplace Safety in Machines and Equipment.
- ISO 10816-3:2009 - Mechanical vibration - Evaluation of machine vibration by measurements on non-rotating parts - Part 3: Industrial machines with nominal power above 15 kW and nominal speeds between 120 r/min and 15 000 r/min when measured in situ. International Organization for Standardization.
- SAE JA1011:2009 - Evaluation Criteria for Reliability-Centered Maintenance (RCM) Processes. SAE International.