Please visit the GIT repo (https://github.com/PHME-Datachallenge/Data-Challenge-2021) to access sample code for data handling and to log issues and to check the Scoring function.
The model refinement dataset is now available through the profile page, plus further details about the submission format are now available on the bottom of the page.
To participate in the data challenge is free but participants are encouraged to also enrol in the conference to see the presentations of the three winning data challengers, as well as many paper presentations on data analytics and diagnostics.
In this edition of the annual PHME Data Challenge, participants are invited to demonstrate application of state-of-the-art algorithms and models to perform fault detection, classification and root cause identification for a manufacturing production line setup. In collaboration with the Swiss Centre for Electronics and Microtechnology (CSEM), exclusive access to rich datasets generated from a real-world industrial testbed has been provided for this competition. The setup comprises sub-systems such conveyor belt motors, infrared camera and robotic arms used in the process of continuous testing of electronic components.
Data has been acquired under fault-free operating conditions and with the support of domain expertise, data has also been generated with a variety of seeded faults under controlled conditions. For example conveyor belts failures, component testing machine failures etc. In this case study, a faulty system may result in components being discarded unnecessarily, a slow-down of the testing speed, or a wrong testing phase. The testbed monitors a total of 50 signals and each dataset comprises signals in three categories:
- Machine health monitoring signals: Pressure, Vacuum, FuseHeatSlope, etc.
- Environment monitoring signals: Temperature, Humidity, etc.
- Others: CPUTemperature, ProcessMemoryConsumption.
Evaluation of algorithm and model capability will be assessed not only on correctness of predictions, but also on the shortest time to produce these results. The aim is to reduce as much as possible the time-to-classification to correctly identify the fault, and the variables that should be checked, so that the operator may solve the outages in the shortest possible time.
Other than the main challenge, this year, we propose a bonus challenge; data was acquired in fault-free operating conditions using different undisclosed system parameter configurations. The challenge is to therefore correctly identify these fault-free operating conditions. This would help the operator to understand whether the manufacturing production line changed behaviour while still running smoothly.
The public dataset used for model training and validation will be made available immediately after the competition is launched. At a second round, a second part of the dataset will be released to refine the models. Application of Data-Driven as well as Physics-based modelling approaches are encouraged. To standardize the final performance evaluation, a default python script will be provided during the competition period.
The winning teams will be asked to prepare a full manuscript which will be featured in the PHM conference proceedings and a representative will be expected to make an oral presentation at the event. The prize will be awarded at the conference social event.
For any questions about the competition, please contact email@example.com
Collaboration is encouraged and teams may comprise students and professionals from single or multiple organisations. There is no requirement on team size. Register your team’s entry by filling the form.
The winning teams will be selected and awarded contingent upon:
- Having at least one member of the team register and attend the PHM 2021 Conference.
- Submitting a peer-reviewed conference paper.
- Presenting the analysis results and technique employed at the conference.
The organizers of the competition reserve the right to modify these rules and disqualify any team for any efforts it deems inconsistent with fair and open practices.
The winning team will be awarded a GPU graphics card courtesy of our sponsor NVIDIA.
Additionally, each of the top four teams will be offered one free registration per team for the conference.
- Competition Open – Challenge description: February 16, 2021
- Validation Data Posted – Submission Portal Open: April 23, 2021
- Competition Closed: May 16, 2021, 11:59:59 GMT
- Winners Announced: May 25, 2021
- Final Papers Due, Winners Announced: June 7, 2021
- PHM Conference: Dates June 28 2021
The testbed under consideration represents a typical component of a large-scale quality-control pipeline and can be easily integrated in multiple different industry 4.0 manufacturing lines.
More specifically, the machine consists of a 4-axis SCARA-robot picking up electrical fuses with a vacuum gripper, from a feeder to a fuse-test-bench. On this fuse-test-bench, first it assesses whether the fuse is conducting electricity. If the first test was successful, the fuse gets heated up by applying a current of 200mA for a time interval of 1.5s. The heating up is measured by a thermal camera. After the tests, the fuse is moved back into the feeder with two conveyor belts. A picture of the machine and an illustration of the main steps involved in the quality-control process are shown in Fig. 1
- The fuses are first picked up by a robotic arm.
- Fuses are then left under the visual field of a thermal-camera responsible for finding signs of overheating or degradation.
- Once the analysis is terminated, fuses are placed on a conveyor belt and sorted by a robotic bar.
- Fuses are moved by the large conveyor belt (green in the figure) to a (5) small conveyor belt that transports them back to the (6) feeder where fuses are stored before restarting the cycle.
Both process control and health data acquisition are implemented with CSEM VISARD. The machine is monitored by an array of 50 sensors recording the evolution of a number of quantities of interest to establish the health state of the machine in real-time. The sensor data is aggregated over a time window of 10s. For each window and sensor, one statistical data point is calculated in order to reduce the data size.
The main components of the experimental rig are the following.
- 4-axis SCARA robot
- fuse feeder, feeding movement electrically powered, barrier to hold back fuses pneumatically powered
- thermal camera (382×288 pixel, 0-250 deg C)
- camera to detect fuses on feeder (1280×1024 pixel)
- EC motor for big conveyor belt
- EC motor for fuse selector
- DC motor for small conveyor belt
- Vacuum pump for robot gripper
- Pressure pump for feeder barrier
- several valves for the pneumatics system
The machine can be operated under different regimes and conditions. Under its nominal working regime, there are no throughput defects at any level of the quality-control-pipeline, from picking-up the fuses to their transportation and analysis. However, different artificial failure modes can be artificially injected by manually altering the behaviour of one or more components. Such deviations from the machines’ normal working condition can be induced at different levels, for instance:
- Modification of the operating mode of the robotic arm picking up the fuses,
- Introduction of a pressure leakage on the pneumatic system
- Altering the speed of the conveyor belts
In total, up to 9 failure cases are introduced. Each of them affects the sensor readings in a different way and one of the goals of the challenge is to rank the signals according to their level of correlation with the failure mode being considered.
Datasets and Challenge
The experimental dataset is composed of a set of 50 signals, each one describing the evolution over time of a quantity of interest. Each experiment can run from ≈ 1 to ≈ 3 hours.
These signals can be divided into three categories:
- Machine health monitoring signals: Pressure, Vacuum, FuseHeatSlope,
- Environment monitoring signals: Temperature, Humidity, …
- Others: CPUTemperature, ProcessMemoryConsumption, …
Each signal is associated with a specific set of fields describing different signal’ features extracted from that signal by the automated data acquisition process.
For example, the signal DurationPicktoPick, which measures the duration between the time the robot picks up one fuse to when it picks up the next one, is described by the following features: vCnt (number of samples recorded in a fixed time-window), vFreq (Sampling frequency within the same time-window), vMax (Maximum value recorded within the time-window), vMin (Minimum value recorded within the time-window), vTrend (time series trend within the time-window) and value (Mean value recorded within the time-window).
Notice that not all the measurements have all the features, for instance, the signal FeederAction1 has only the vCnt feature. This is because certain signals can not be sampled at regular frequencies, hence, the corresponding signals simply describe the occurrence of a certain event at a certain time. For all the signals, the aforementioned features (vCnt, vFreq….) are extracted from a window of 10 seconds. A separate file listing the names of the features associated with each measurement is provided.
Experimental data have been acquired under fault-free operating conditions and, with the support of domain expertise, experiments have also been generated with a variety of seeded faults under controlled conditions.
Fault-free experiments (having label 0) represent the behaviour of the machine during its normal operating regime. In this case, the machine does not present any problem and runs smoothly. Fault-free experiments have been acquired by using two different system parameter configurations. Yet, both system parameter configurations lead to a nominal system behaviour.
Seeded faults operating conditions are characterized by anomalous behaviour. Depending on the fault, unhealthy experiments have been labeled with 8 different labels. Each fault is characterized by an anomalous behaviour of one or or more signals.
To download the Dataset, after registering online, log on your profile page.
The objectives of this data challenge are:
- Identify and classify the faults: to cope with a predictive maintenance scenario, the teams have to develop models able to predict the fault (if present) in unlabeled data
- Identify the signals having anomalous behaviour: to help the manufacturer operator to understand the fault, the teams have to rank the input signals to identify the most important ones for the prediction.
- Predict the correct fault in the shortest time: to reduce as much as possible the waste of time and components, the teams have to identify the presence of the fault in the shortest possible time. As such, teams have to design solutions working in a streaming fashion. The streaming nature of the solutions have to read the unlabeled data following the time window order. The solutions have to indicate the fault label as soon as possible.
- BONUS – Identification of system parameter configuration: fault-free experiments have been acquired with two different system parameter configurations yielding to similar smoothly operating performance. The teams have to develop unsupervised solutions able to identify the experiments performed with the different system parameter configurations.
For these objectives two sets of data are provided in different phases of the challenge:
Model training and validation: This dataset will be released at the competition launch. It comprises 70 experiments describing fault-free experiments (class 0) and 5 faulty classes (class: 2, 3, 5, 7, 9). The fault-free experiments include experiments with both system parameter configurations. Table 1 describes the label distribution.
Model refinement: This dataset will be released three weeks before the end of the competition. It comprises 29 experiments describing fault-free experiments (class 0) and 3 faulty classes (class: 4, 11, 12). The fault-free experiments include experiments with both system parameter configurations. Table 2 describes the label distribution.
Table 1 Model training and validation – 70 experiments
|Class||Number of Datasets|
Table 2 Model refinement – 29 experiments
For the submission, participants are required to submit 3 parts:
- a Jupyter Notebook in Python 3 including the Test Classification function (the function that runs the classification task on new data). Please visit the GIT repo (https://github.com/PHME-Datachallenge/Data-Challenge-2021) to access sample code for data handling and to log issues,
- the created models (e.g., pickle file) which is the trained model that will be used by the Test Classification function to classify new data,
- and a short paper (maximum 4 pages) describing the applied methodology.
The file can be submitted as a single zip file. Please include the team’s name in the zip file name.