Cluster Analysis

Identifying similarity structures

Identifying similarity structures

In cluster analysis, the aim is to find similarity structures in data rooms. These data rooms can, for example, be generated by physical measured variables. The objective of cluster analysis is to arrange existing measurement points into groups so that they are as similar as possible within a group and as different as possible between the groups.

Detecting anomalies

Clusters determined in this manner can be interpreted with domain knowledge and be used to classify still unknown measurement points and/or to detect anomalies. The distribution of anomalies in the data room can provide information about the cause of errors.

Clustering with
the k-means algorithm

One of the best known methods for analyzing clusters is the k-means algorithm. It starts with a pre-determined number of randomly distributed cluster centers. Each data point is assigned to a center so that the total of all gaps between the data point and the assigned center is minimal. The centers are re-determined by calculating the mean for each cluster and the data points are reassigned. This assignment process is carried out iteratively until the optimum cluster configuration is found.

Identifying fault statuses

One area in which the cluster analysis method can be useful is in identifying the system statuses of multi-status systems. These statuses correspond to the similarity structures of the measurement data found in an intact system. Measurement data can be, for example, sensor data such as torque or currents, which can describe system statuses. The illustration of a cluster analysis shows an example of three identified operating statuses of a motor in the voltage-torque space (see colored ellipses). Measurement data that cannot be clearly assigned are classified as outliers (orange).     

The quality status of the motor can now be evaluated based on the number and distribution of the outliers. Outliers can also provide information about the type of faults or wear and tear processes. This can speed up identification of faults and, consequently, reduce machine downtime.  

Your benefit from the cluster analysis

Recognize errors
in good time


maintenance costs


Any questions? We are happy to help.

Katana, the data analytics segment of the USU Group, has extensive experience in the application of machine learning methods in the industrial sector. Take advantage of our knowledge and our solutions for building your data-driven business models to reduce your costs and improve your quality and value added.