Resource Efficiency on Circular Knitting Machines

This notebook demonstrates two simple examples of using gathered sensor data to examine the resource efficiency of a circular knitting machine.

The measured signals contain the cam temperature, cam force, yarn tension, drive power and vibrations, measured at different rotational velocities.

In the two examples, we examine the difference between worn and less worn needle feet as well as the difference between machine state at different lubricant amounts. We use preprocessed data containing aggregated values per machine rotation and rely heavily on existing Python frameworks for data processing and modeling.

We start by importing the necessary modules.

Application 1: Needle Wear

First, we load the previously saved data from a pickle container into a pandas dataframe.

Display some basic info, such as the data itself, as well as data type, NaN values and simple statistics. Here, we focus on the vibration signals, observing the maximum values and RMS values per rotation for all spatial directions. The speed in these experiments was reduced to 10 rpm. Two measurements were conducted per needle state.

Data points were labeled with False for less worn and True for worn needles.

How many data points are labeled True?

Due to similar orders of magnitude there is no need for standardizing the data. All data points will be interpreted as independent information. First, we plot as an example, both features regarding y direction.

It appears that at least y features might separate output categories sufficiently. Let's also check the box plots.

Boxplots indicate class separability for x and y. Let's define some simple models and check their performance.

First things first. We separate the input columns from the output and set 30 % of the data points aside as a test set.

We define three classification models: logistic regression, K nearest neighbors and support vector machine.

Logistic regression tries to fit a logistic function $ f(x) = \frac{L}{1+ e^{-k(x-x_0)}} $ to data to predict a binary variable (in this case, the grade of needle wear).

K nearest neighbor assigns data to a class which is most common among its K neighboring points with respect to a predefined metric.

Support vector machine tries to find a hyperplane which maximizes the gap between categories.

During the training, we use 10 fold cross validation to avoid overfitting. We print the mean and standard deviation of each model's performance.

All three models achieve classification precision well above 95 %. We can also visualize their performance across all folds.

We fit the models with the whole training set again and compute predictions on the test set. Finally, we print the classification report and visualize confusion matrices for the test set.

On the test set, all models achieve at least 95 % precision. We can conclude that the models perform well without overfitting.

Application 2: Lubricant Amount

Several experiments have been conducted with three different lubricant amounts: 4, 20 and 40 ml/Hour/Nozzle. Preprocessed data contain aggregated quantities per rotation at 20 rpm. Again, we first load the data and display basic information.

Experiments showed that the quantity of 4 ml/hour/nozzle appears to be too low, occasionally leading to higher friction and wear on needles and cams. On the other hand, there are no significant differences in the machine behavior at 20 and 40 ml/hour/nozzle. Therefore, we can define all data points with 4 ml/hour/nozzle as True (lubricant amount too low) and all other data points as False. We arbitrarily set the thershold at 10 ml/hour/nozzle. Note that in order to specify this threshold reliably, further experiments would be necessary.

By performing PCA on the standardized data set and plotting first 3 components we can get an impression of how well both classes can be separated. We cannot use PCA in a model because standardization would introduce a data leak.

The first three PCA components indicate some data clusters and apart from outliers, both classes are close together but distinguishable. We first separate data points from labels and extract column names.

To obtain a good classification result without scaling, we can use random forests. However, random forests are sensitive to class imbalance which appears to be the case here. A quick check confirms it.

We only have about 15 % of data labeled True. To compensate this, we oversample the minority class with SMOTE technique and undersample the majority class.

Firstly, we oversample to half of majority class and undersample to equal distribution.

Next, we separate the training and test set, define the model and pack the steps into a Pipeline framework to not have to worry about the correct order of operations. Finally, we define a 10 fold cross validation method.

Finally, we train the model, predict the label on the test set and display some performance metrics including the confusion matrix. The training may take a while but should be done in a few seconds.

The model achieves excelllent perofmance with only one sample labeled as false negative. As a peek into the backgorund, we plot three examples of decision trees in the random forest to see which features contributed to individual decisions.

By traversing each tree from the root to the leaves, the criteria for class separation can be examined. The prediction for the whole model is obtained by voting, i.e. class assigned to by the majority of trees is accepted.

We check which features contribute most to the model.

Apparently, in this case, y features are the most relevant for the model.