Knowledge Extraction from Sensor Data


This section presents a simple example based on a dataset gathered in an office environment consisting of several hundred thousand sensor readings. The sensor was deployed near an office desk and measured light level, PIR, sound, and energy consumption of the workstation at the desk.
The dataset can be downloaded from the Download section.

Data Import

The first step is the data import. Currently KAT supports CSV and MS Excel formats. To import data click on the Load Data button and select a comma seperated file or an MS Excel file. You can download the sample dataset via this link, save it as a csv file and import it to KAT.


Once the data is imported, the labels will appear on the screen. Select "watts" check box from the data labels to show the data in a diagram (as shown below). In case that the data is not shown, select check box and left click on empty diagram.


Data Pre-Processing

To highlight features of the data set, a minmax filter is used in this example. The minmax filter divides the data into windows and subtracts the minimum value in the window from the maximum value. This filter can highlight outliers and reduces the noise.


Dimension Reduction

To reduce the amount of samples a simple averaging method called PAA is applied in this example. PAA takes the mean of each input in a window and produces a single aggregated value. In this example the data is reduced from 10.000 samples to 100 values.


Feature Extraction

To find interesting patterns that are likely to represent an event, phenomena or interesting observations a k-means clustering algorithm is used. In this example, the k-means algorithm is run with a group size of three. The algorithm clusters periods of low activity (low power), medium activity, and peaks (high power usage) into the groups labelling them from 0 to 2.



To find relationships between different groups produced in the previous section, a markov based statistical model is applied to the data. The model returns the likelihood of the temporal presence of the different groups.


Representation (unstable, not useable yet)

Eventually all gathered data can be represented in a semantic form using the tool