## The Nodes that you need to run Time Series Analysis in KNIME

- 14/04/2014
- 112
- 0 Like

**Published In**

- Big Data
- Analytics
- Business Intelligence

This post is really an update to a previous post from 2 years ago about implementing time series analysis in KNIME:http://www.dataminingreporting.com/2/post/2012/07/time-series-prediction-in-knime.html

**The Time Series Analysis**

**Problem**

The whole idea of that post was to move the time series past into the data table rows and to apply any traditional data analytics algorithm to predict the future.

For example, if I have daily data x(t), and I want to use the past 3 days x(t-1), x(t-2), x(t-3)to predict the current data, I need to create a data table with the following structure: x(t-3) x(t-2) x(t-1) x(t). That is, I need to move the past 3 values of the time series in the same row of the current value x(t).

Once I have this structure, I can use for example a linear regression and model the target variable (current value) x(t) using its past 3 values x(t-3) x(t-2) x(t-1). It does not necessarely have to be a linear regression model: any supervised data analytics algorithm with numerical output will do as well.

The project then ended with calculating the distance between the predicted and the original time series as an error measure for the whole procedure. The distance measure we used was a mean root square error, but any distance would have also worked.

**New nodes in KNIME 2.8**

A few things have changed in KNIME since version 2.8, making the work on time series much much easier.

First of all, the

**Lag Column**node has been introduced exactly to perform the first step, i.e. to move the n past values of a time series on the same row as the current value. You just need to specify the Lag (n=3 in this case) in the node configuration window, to move from x(t) to x(t-3) x(t-2) x(t-1) x(t).

This node can also put the current value and any past value side by side. For example, if I have daily data and I want to compare each value with the previous week value, I can set the Lag Interval to 7 in the configuration window and, after execution, I would get the following data table: x(t-7) x(t). This is particularly useful for periodicity or seasonality correction.

The whole complex metanode used in the old post, to transform the data and make them useful to feed a data analysis algorithm, is made obsolete by the Lag Column node (see figure below).

This node can also put the current value and any past value side by side. For example, if I have daily data and I want to compare each value with the previous week value, I can set the Lag Interval to 7 in the configuration window and, after execution, I would get the following data table: x(t-7) x(t). This is particularly useful for periodicity or seasonality correction.

The whole complex metanode used in the old post, to transform the data and make them useful to feed a data analysis algorithm, is made obsolete by the Lag Column node (see figure above).

KNIME 2.8 has also made the error calculation for time series much easier.

The

In conclusion, now you can run a time series analysis in KNIME with only 3 nodes: a Lag Column node to define the past, a Training/Predictor node to build the model, and a Numeric Scorer to measure the prediction error.

KNIME 2.8 has also made the error calculation for time series much easier.

The

**Numeric Scorer**node indeed calculates a few distances (R2, mean absolute error, mean squared error, mean root squared error, mean signed difference) between two time series and makes them available in its View as well as at its output port.**Conclusions**In conclusion, now you can run a time series analysis in KNIME with only 3 nodes: a Lag Column node to define the past, a Training/Predictor node to build the model, and a Numeric Scorer to measure the prediction error.

- 14/04/2014
- 112
- 0 Like

## The Nodes that you need to run Time Series Analysis in KNIME

- 14/04/2014
- 112
- 0 Like

#### Rosaria Silipo

Principal Data Scientist at KNIME

Opinions expressed by Gladwin Analytics members are their own.

#### Top Authors

This post is really an update to a previous post from 2 years ago about implementing time series analysis in KNIME:http://www.dataminingreporting.com/2/post/2012/07/time-series-prediction-in-knime.html

**The Time Series Analysis**

**Problem**

The whole idea of that post was to move the time series past into the data table rows and to apply any traditional data analytics algorithm to predict the future.

For example, if I have daily data x(t), and I want to use the past 3 days x(t-1), x(t-2), x(t-3)to predict the current data, I need to create a data table with the following structure: x(t-3) x(t-2) x(t-1) x(t). That is, I need to move the past 3 values of the time series in the same row of the current value x(t).

Once I have this structure, I can use for example a linear regression and model the target variable (current value) x(t) using its past 3 values x(t-3) x(t-2) x(t-1). It does not necessarely have to be a linear regression model: any supervised data analytics algorithm with numerical output will do as well.

The project then ended with calculating the distance between the predicted and the original time series as an error measure for the whole procedure. The distance measure we used was a mean root square error, but any distance would have also worked.

**New nodes in KNIME 2.8**

A few things have changed in KNIME since version 2.8, making the work on time series much much easier.

First of all, the

**Lag Column**node has been introduced exactly to perform the first step, i.e. to move the n past values of a time series on the same row as the current value. You just need to specify the Lag (n=3 in this case) in the node configuration window, to move from x(t) to x(t-3) x(t-2) x(t-1) x(t).

This node can also put the current value and any past value side by side. For example, if I have daily data and I want to compare each value with the previous week value, I can set the Lag Interval to 7 in the configuration window and, after execution, I would get the following data table: x(t-7) x(t). This is particularly useful for periodicity or seasonality correction.

The whole complex metanode used in the old post, to transform the data and make them useful to feed a data analysis algorithm, is made obsolete by the Lag Column node (see figure below).

This node can also put the current value and any past value side by side. For example, if I have daily data and I want to compare each value with the previous week value, I can set the Lag Interval to 7 in the configuration window and, after execution, I would get the following data table: x(t-7) x(t). This is particularly useful for periodicity or seasonality correction.

The whole complex metanode used in the old post, to transform the data and make them useful to feed a data analysis algorithm, is made obsolete by the Lag Column node (see figure above).

KNIME 2.8 has also made the error calculation for time series much easier.

The

In conclusion, now you can run a time series analysis in KNIME with only 3 nodes: a Lag Column node to define the past, a Training/Predictor node to build the model, and a Numeric Scorer to measure the prediction error.

KNIME 2.8 has also made the error calculation for time series much easier.

The

**Numeric Scorer**node indeed calculates a few distances (R2, mean absolute error, mean squared error, mean root squared error, mean signed difference) between two time series and makes them available in its View as well as at its output port.**Conclusions**In conclusion, now you can run a time series analysis in KNIME with only 3 nodes: a Lag Column node to define the past, a Training/Predictor node to build the model, and a Numeric Scorer to measure the prediction error.

- 14/04/2014
- 112
- 0 Like

## Rosaria Silipo

Principal Data Scientist at KNIME

Opinions expressed by Gladwin Analytics members are their own.