Strategic data science is an essential pillar of every Industry 4.0 scenario. A four-step data mining approach based on CRISP-DM supports successful projects.
Whenever big data is mentioned, most people first think of social media or the analysis of customer behaviour in e-commerce. But think again, as strategic data analysis is also gaining momentum in the production environment. Frost & Sullivan believes that data analysis in the industrial sector has immense potential. Its experts found out that production efficiency could be increased by about 10%, operating costs could be reduced by almost 20% and maintenance costs could be minimised by 50% when focusing more on working with the data that’s already in the production process.
Although data can be collected and stored in factories relatively easily, little happens after that, and important insights hidden in the available information are lost. There is often a lack of budget and personnel to devote to this task, yet overcoming these hurdles and focusing on industrial data science will soon gain new insights. Ultimately, it will transform the production environment into a data paradise.
Manual analyses and dashboards aren’t enough
Channelling the huge flood of data and extracting value from the information collected by sensors, controllers and machines is undoubtedly a complex task, as it involves more than standard statistical methods and tools. Manual evaluations and the creation of dashboards and reports are not enough.
One reason for this is that dashboards become increasingly complicated as data volume increases. On the other hand, they also don’t show relevant information at the right time, so that you can see at a glance what is going on and take action. The routines implemented in a normal machine control system for monitoring production processes and detecting errors can identify current deviations and problems. Nonetheless, they are not able to predict future problems, link information in a meaningful way and perform advanced analysis.
Data scientists and production experts must collaborate
The central task of data analysis in Industry 4.0 scenarios is to extract decision-relevant information from collected data and present it to the right user at the right time. This involves planning the process of converting data into useful information in a conscientious and well-founded manner and then implementing it. According to Omron, this requires close cooperation between data scientists and specialists in production processes who know the story behind the data.
The three Vs of big data
Data scientists are especially familiar with the three Vs of large data sets:
A modern packaging machine can easily generate gigabytes of data per day that can be stored over a long period of time and for inspection machines this can be up to many terabytes per day.
Storing this amount of data is not a problem, but using it is a challenge. Machines today not only produce data, but the type of data is much broader than it was a few years ago. Not only measured values are stored, but also raw information from sensors and other ‘metadata’. It is not only about maintenance results, but also about associated images. Data can be generated by the machine operator. This includes cycle times and even written and spoken feedback.
But that’s not all – raw data from sensors is usually read every millisecond and must be treated as streaming data. At the same time, the speed of data analysis is playing an increasingly important role. It is therefore not enough to update dashboards once a day or every hour.
An operator wants to be informed about potential problems immediately to avoid difficulties and downtime. Ideally, the machine should therefore be notified in real time so that it can automatically correct itself within the same product cycle.
Data may also be corrupted due to a problem in the sensor or other device, data may be missing or recorded in an outdated manner. Because this can seriously compromise analysis and lead to false conclusions, data scientists must continually check the veracity of the data – a fourth v.
Data science project approach
Industrial data science is quite a new discipline. That’s why, there is (still) no generally valid approach that is suitable for every company. Every solution and application require customised data analysis and modelling to achieve the best possible result. However, a standard approach is useful. The CRISP-DM model (cross-industry standard process for data mining), is the most commonly adapted basis. Omron simplified and tailored CRISP-DM into a new approach.
The four steps of this approach are preparation, analysis and application development, evaluation and maintenance.
Practical example SMT line
A data-driven solution does not always have to include fancy machine learning models or artificial intelligence. Sometimes, effective data processing and providing the right information at the right time in the right way can be enough. An illustrative example of such a data science project can be found in the current white paper Data Science Services by Omron – How to get the full value from your factory floor data, which is available for free download.
This specific project was carried out at the Omron Manufacturing of the Netherlands factory on surface-mount technology (SMT) lines where electronic components are mounted and soldered onto printed circuit boards (PCBs).
Transform data into useful information
Developing the potential of big data in your own production environment is not easy, but it is worth it. It’s not enough just to collect data and build a few graphs – instead, it is important to filter out production-relevant information from the data and present it to the right audience in the right way.
The key is to transform data into useful information. This must be done in close cooperation between data scientists and experts in the production process. Only then can a solution be developed that is popular, often used and generates long-term value.
At Omron it is done using these steps:
Phase 1: Preparation
The preparation phase is the most important phase. A data science project will never be successful if the goal is unclear. In this first important step, all participants and area experts must first deal with the problem or the specific requirement in order to arrive at a clearly defined project goal. They analyse the machine and/or the production process in detail in order to get an overview of which data is already available and which still needs to be collected. In this process, an initial data set can be collected and analysed as a kind of feasibility study. At the end of the preparation phase, a report is produced that provides insights into the expected generated value and a realistic ROI.
Phase 2: Analysis and application development
Next, the data is collected over a longer period of time in order to obtain a representative picture of the machine and process behaviour. Depending on the project objective, a data pipeline contains the following stages:
- Data collection: Data is gathered from various sources, from raw sensor data to information from MES systems.
- Data pre-processing: The collected data is prepared for the analysis step, transformed, merged and cleaned up.
- Data analytics: The developed analysis algorithms and machine learning models are applied.
- Application: The results and conclusions of the data analysis are made available. Examples are visualisations, tailored to the situation, target group or as feedback to the machine.
The necessary machine learning models can be trained and validated together with the other data processing steps. If the validation is successful, an application can be developed based on the described data pipeline, which can be easily implemented and executed.
Phase 3: Evaluation
The application is used in the production environment, performance and business results are evaluated. If the performance does not meet expectations, the previous project phases are rerun again.
Phase 4: Service and maintenance
Production processes change and machine behaviour is also subject to constant change over time. Reasons for this can be updates or normal wear and tear. A regular revalidation of the solution is necessary to ensure that the solution works realistically and retains its value. The amount of data available is also growing, and often better models can be developed. As a result, existing (machine learning) models need to be reviewed regularly.