Natural Language Processing (NLP) for Automated Classification of MRO Maintenance Tickets

Technical analysis: Natural Language Processing for automated ticket classification in MRO

Traitement du Langage Naturel (NLP) pour la Classification Automatisée des Tickets de Maintenance MRO - UNITEC-D Industrial MRO
L'application du Traitement du Langage Naturel (NLP) permet de structurer les données textuelles issues des interventions de maintenance. Cette approche technique standardise la classification des déf

1. Introduction: Natural language processing applied to CMMS

In the aeronautics and energy sectors in France, the management of maintenance, repairs and operations (MRO) is based on the precise analysis of intervention data. Technicians enter thousands of reports daily into Computer Aided Maintenance Management (CMMS) systems. These entries, often carried out in the form of free text in constrained environments (ATEX zones, Nadcap certified clean rooms), generate a massive volume of unstructured data.

The major technical problem lies in the exploitation of these free texts. A description such as “pump rlt leak, heats to 85°C” contains critical information about the failure mode, affected component, and operating parameters. Without structuring, this data cannot be used for the precise calculation of reliability indicators (MTBF, MTTF) or for the optimization of spare parts stocks. Natural Language Processing (NLP), a branch of artificial intelligence, provides a technical solution to convert this raw text into categorized and usable data, compliant with industrial standards.

2. Operating principle: From free entry to structured data

NLP applies mathematical and linguistic algorithms to extract technical meaning from maintenance reports. The process is divided into several sequential stages:

  • Text preprocessing: Normalization of technical vocabulary. Common abbreviations (e.g. “rlt” for bearing, “press” for pressure, “temp” for temperature) are converted. Typos are corrected via Levenshtein distance algorithms.
  • Named Entity Recognition (NER): The model identifies and classifies terms specific to MRO. It extracts the equipment (eg: compressor, valve), the component (eg: mechanical seal, stator), the failure mode (eg: seizure, short circuit) and the associated numerical values ​​(eg: 45 bar, 120 °C, 0.5 mm).
  • Semantic classification: Algorithms (such as models based on the Transformer architecture, e.g. CamemBERT for French) project words into a vector space. This allows the textual description to be associated with a standardized fault tree, regardless of the exact wording used by the technician.

Here is an example of JSON output generated by an NLP model from a raw ticket:

{ "ticket_id": "MRO-2023-8472", "raw_text": "Supply pump bearing oil leak, pressure dropped to 3 bar. Replaced 45mm O-ring.", "entities": { "equipment_class": "Centrifugal pump", "component": "Bearing", "failure_mode": "Leak", "operating_parameter": {"type": "Pressure", "value": 3, "unit": "bar"}, "action_taken": "Replacement", "spare_part": {"type": "O-ring", "dimension": "45mm"} }, "iso_14224_code": "PMP-CEN-B-LEAK" }

3. Data requirements

The accuracy of an NLP model directly depends on the quality and volume of the training data. For efficient industrial deployment, the following prerequisites must be respected:

  • Historical volume: A minimum of 10,000 to 15,000 closed maintenance tickets is required to train a supervised classification model with an acceptable confidence rate (F1-score > 0.85).
  • Standardized taxonomy: Output data must be aligned to a recognized standard. The ISO 14224 standard provides a strict framework for collecting reliability data, structuring equipment into classes, units, subunits and maintainable parts.
  • Quality of the annotation: The training requires a dataset annotated by business experts (maintenance engineers with more than 10 years of experience). Incorrect annotation will introduce systematic bias into the model's predictions.

4. Implementation architecture

Integrating an NLP system into an existing MRO infrastructure requires a secure and highly available IT architecture. The typical data flow is organized like this:

The technician enters his report on a mobile terminal (often an ATEX Zone 1 or 2 certified tablet for the energy sector). The application transmits the text via a secure REST API (TLS 1.3) to the factory integration gateway. The text is then processed by the NLP microservice, hosted either on a local server (Edge computing) for reasons of data confidentiality (ITAR, defense secrets in aeronautics), or in a certified private cloud ISO 27001.

The model returns structured tags in less than 200 milliseconds. This metadata is automatically injected into the dedicated fields of the CMMS (SAP PM, IBM Maximo, or Infor EAM). The ERP system is then able to associate the identified failure mode with the bill of materials (BOM) of the equipment, thus facilitating the automatic reservation of necessary spare parts for future interventions.

5. Operational results and performance metrics

Applying NLP for ticket classification demonstrates measurable results. The analysis of a deployment project within an aeronautical component manufacturing plant (Nadcap certified) processing 25,000 work orders per year provides the following metrics:

  • Classification accuracy: The rate of work orders manually classified into the generic “Other” category decreased from 45% to less than 8%. The accuracy of identifying the faulty component reached 92%.
  • Reduction in MTTR (Mean Time To Repair): Immediate identification of spare parts associated with textual descriptions helped reduce in-store search time. Overall MTTR decreased by 14%, representing an average gain of 45 minutes per corrective intervention.
  • Inventory Optimization: Precise correlation between actual failure modes and parts consumption helped identify and reduce dead stock (components stored for failure modes that never occur) by 12%.

Financially, the initial investment (CAPEX) for this project amounted to €95,000, including data cleaning, model training and API integration. Operating costs (OPEX) are €15,000 per year. Savings generated by reducing downtime and optimizing inventory amount to €140,000 annually, providing a return on investment (ROI) of 9.4 months.

6. Technical limits and constraints

Despite its performance, automated classification by NLP has technical limitations that must be integrated into the project risk analysis:

Data Drift: Technical vocabulary is evolving. The introduction of new machines or new technologies leads to the appearance of new terms. A model trained in 2021 will lose accuracy in 2024 if it is not retrained periodically.

Limitations of contextual inference: The model does not understand the physics of failure. If a technician describes an unusual symptom (e.g.: “abnormal vibration at 1500 rpm transmitted by adjacent piping”), the model may classify the error as the piping rather than the unbalanced rotor. Human validation (Human-in-the-loop) remains essential for critical failures.

Quality of the initial input: NLP cannot invent missing information. A ticket containing only the mention “Machine down” can never be correctly classified at the component level, regardless of the level of sophistication of the algorithm.

7. Acquisition strategy: Internal development or commercial solution

Technical departments must decide between the development of a specific model (Build) and the acquisition of an existing software solution (Buy).

Criterion Internal Development (Build) Commercial Solution (Buy)
Initial cost (CAPEX) High (€120,000 - €200,000) Moderate (€30,000 - €60,000)
Deployment time 6 to 12 months 1 to 3 months
Intellectual property Total. Model adapted to specific factory jargon. Limited to SaaS provider. Generic model.
Maintenance (OPEX) Requires a team of in-house Data Scientists. Included in the SaaS subscription (guaranteed SLA).

Internal development is justified for manufacturers with highly specific equipment and databases of more than 100,000 historical tickets. For the majority of industrial sites using standard equipment (pumps, compressors, electric motors, industrial valves), the integration of a commercial pre-trained NLP module offers a significantly lower TCO (Total Cost of Ownership).

8. Roadmap for deployment

To guarantee the technical success of an NLP project applied to MRO, the following approach is recommended:

  1. Audit and data extraction: Export three years of CMMS history. Evaluate the fill rate of fields and the proportion of free text.
  2. Definition of the taxonomy: Align the failure codes with the NF X60-000 and ISO 14224. standards Reduce the number of categories to avoid statistical dilution (aim for 50 to 100 main failure classes).
  3. Proof of Concept (PoC): Target a family of critical equipment (e.g. gas turbines or air compressors). Train the model on this subset and measure accuracy (F1-score) against expert manual classification.
  4. Shadow mode deployment: The model analyzes tickets in real time without modifying the database. Reliability engineers compare AI predictions with real data to adjust hyperparameters.
  5. Production and ERP integration: Activate automated writing in the CMMS. Link generated failure codes to spare parts catalogs to speed up the procurement process.

9. Summary

Natural language processing brings mathematical rigor to the analysis of textual maintenance data. By converting technicians' field experience into structured data compliant with ISO standards, maintenance departments obtain precise visibility into the actual failure modes of their installations. This structuring is a technical prerequisite for applying advanced predictive maintenance strategies and for optimizing the MRO supply chain.

Immediate and precise identification of failing components makes it possible to rationalize technical purchases. To guarantee the reliability of your maintenance operations, access the technical specifications of our certified industrial components by consulting the UNITEC-D E-Catalog.

10. Normative references

  • NF EN 13306: Maintenance - Maintenance terminology.
  • ISO 14224: Oil, petrochemical and natural gas industries — Collection and exchange of equipment reliability and maintenance data.
  • AFNOR NF X60-000: Maintenance function.
  • ISO/IEC 27001: Information technologies - Security techniques - Information security management systems.

Related Articles