The Role of Advanced Models in Performance Boosting

Issue 08 Q1 2008
Automated Trader Magazine

In the second part of a two-part article, David Aronson, President of Hood River Research, examines the modelling techniques used in arriving at a valuable predictor set for boosting ‘raw’ trading model performance.

The development of a boosting model is a two-stage process. The first is discovering which, if any, of the large list of candidate indicators proposed for consideration are helpful in predicting signal returns. The second is establishing the shape of the surface that best describes the relationship between the selected candidate indicators and signal returns. One widely used modelling technique is multiple linear regression, which assumes that the shape of the surface is linear (flat with no hills and valleys). Only the slope of the surface with respect to each axis (i.e. the weight of each indicator) is left open to discovery. However, modern data modelling techniques have allowed the constraining assumption of a flat surface to be eliminated. This allows the modelling procedure to discover the most appropriate shape for the model’s hyper-surface1.

Advanced modelling vs. linear regression

As powerful as multiple linear regression is, relative to the intuitive judgment of human experts, greater predictive power can be attained with more advanced modelling methods that are not constrained by the simplifying assumption of linearity. This creates the opportunity for more accurate predictions of signal outcomes. Advanced methods such as kernel regression can detect complex non-linear relationships. Figures 1 to 5 illustrate how more sophisticated non- linear modelling differs from traditional linear regression. For simplicity, the illustrations depict a single indicator (Xi) on the horizontal axis and the return earned by the signal on the vertical axis. Figure 1 shows the true functional relationship between signal returns and Xi, which is unknown in the sense that the true shape of the function sought in any predictive modelling problem is by definition unknown and remains to be inferred from an observed sample of data. Note that the relationship is not linear.


In Figure 2, a sample of trading signals is shown, with each point representing a single signal. The position with respect to the vertical axis is the return earned by the signal and the position with respect to the horizontal axis is the value of indicator Xi at the time the signal was given. For purposes of illustration, there is an obvious relationship between Xi and signal return, which is unlikely to exist in practice.

Figure 3 shows the model surface that results from modelling this data with linear regression. This linear model is too simple and wrong in a systematic sense in that it assumes that the model surface is flat throughout the range of the predictor variable Xi, thus causing the model to make systematic errors (i.e. the model is biased). In some ranges of Xi the model’s predictions of signal return are systematically too low, while in other ranges the predictions are systematically too high. Systematic errors are symptomatic of a model surface that does not accurately represent the underlying functional relationship between the indicators and signal return.

A reasonable solution would be to propose a more complex model by visual inspection, such as a parabola. However, in the case of a linear model, a quadratic (parabolic) model, a cubic model where Xi is raised to the third power, or any model where the functional relationship is assumed prior to analysis, the model is constrained to adopt the assumed form. This is perfectly legitimate when there is well established theory that suggests what the correct functional form should be.

However, for many phenomena characterised by high complexity and high randomness, such as financial markets, there is no well-established theory to support the choice of a particular functional form. In these situations, constraining the analysis to an assumed functional form is too limiting. Instead an approach is needed that does not require the assumption of the shape of the model’s surface – in other words, non-parametric modelling.

Adapting non-parametric models

One example of non-parametric modelling is kernel regression. Figure 4 shows the model surface that would be obtained by applying kernel regression to the sample data. Note that the discovered model surface conforms closely to the true relationship depicted in Figure 1. Kernel regression discovers the correct shape of the surface by estimating the value of the dependant variable Y (signal return) within small local ranges along the Xi axis. The simplest approach to kernel regression takes an average of the Y values in each local Xi neighbourhood. This becomes the altitude of the model surface in that region of Xi. A more sophisticated kernel method fits linear models to each small Xi neighbourhood. ...

Limited Access

This article is for registered viewers and paid subscribers only, please either log into your account above or click here to register an account now with Automated Trader Magazine.