Pretests for geneticprogramming evolved trading programs: 'zerointelligence' strategies and lottery trading  Part 2
Part 2 of Pretests for geneticprogramming evolved trading programs: "zerointelligence" strategies and lottery trading bootstrap paper. By ShuHeng Chen and Nicolas Navet
To download this article in PDF format, please click here
3 What does the pretest tell us?
The outcomes of the pretests provide us with answers to the following two questions: is there something essential to learn on the training data that can be of interest for the outofsample period ? Does the GP implementation show some evidence of effectiveness in that task ? Clearly, before actually trading with GP evolved programs, these two questions must be answered with reasonable certainty; the rest of this section explains how pretests may help in that regard.
3.1 Question 1: is there something to learn?
The null hypothesis _{fl4,0} corresponding to pretest 4 has been presented in Section 2.2. We introduce pretest 5 that will be used in conjunction with pretest 4.
Pretest 5: equivalent intensity random search with training and validation versus lottery trading.
Here, we compare lottery trading to a random search with
training and validation, and a search intensity equivalent to
the one used for GP in pretest 4. The null hypothesis
_{fl5,0} is that the equivalent intensity random
does not outperform lottery trading on the outofsample data.
Depending on the validity of _{fl4,0}^{and}
fl5,0, we can draw the conclusions that are summarized in Table
1.
fl4,0 
fl5,0 
Interpretation 

case 1 
R 
R 
there is evidence that there is nothing to learn 
case 2 
R 
R 
there may be something to learn (weak certainty) 
case 3 
R 
R 
there is evidence that there is something to learn 
case 4 
R 
R 
there may be something to learn (weak certainty) 
Table 1. Information drawn from the outcomes of pretest 4 and pretest 5 (R means that the null hypothesis _{fli,0} cannot rejected while R means that the hypothesis is rejected in favour of the alternative hypothesis).
In case 1, best solutions on the training intervals, obtained with 2 different search algorithms, do not perform better than lottery trading on the outofsample period. This suggests to us than there is nothing to learn. In case 2, GP outperforms lottery trading but random search does not; it is possible that there is something to learn, but that the selected random rules do not have a sufficient predictive ability. Anyway, this lead us to a less certain conclusion than in case 3 where both search techniques outperform lottery trading. Finally case 4 is a special case where random search performs better than lottery trading but GP does not. The whole evolutionary process of GP has thus a detrimental effect and a possible explanation is that GPinduced solutions overfit the training data.
3.2 Question 2: is the GP machinery working properly?
The second question we ought to ask is whether GP is effective.
Of course, this cannot be answered with the data at hand if
pretests 4 and 5 have shown that there is nothing to be learned
(case 1 in Table 1). In addition, in case 4 of Table 1, we
already know that GP is not efficient since, by transitivity,
it is outperformed by the random searchbased algorithm. Thus,
the only two cases where one really needs to proceed to further
examination are case 2 and case 3.The validity of the null
hypothesis _{fl1,0,} which can be tested with pretest
1, gives a helpful insight into the answer: only if
_{fl1,0} should be rejected can we conclude that GP
shows some real effectiveness. We would like to stress that
rejecting _{fl1,0} is far from implying profitability,
but beating a mere random search algorithm on a difficult
problem with an infinite search space is the bare minimum one
can expect from GP.
4 Experiments
The aim of the experiments is to evaluate the extent to which the pretests proposed are reliable. The methodology adopted here is to check if the outcomes of the pretests are consistent with results already published in the literature. We call GP2 the GP implementation developed for this study and GP1 the software^{,4} used in [5], which will constitute our benchmark. The GP2 control parameters, as close as possible to the ones used in [5] for GP1, are summarized in Table 1 (Appendix A).
The traded instruments are the indexes of 3 stock exchanges: the TSE 300 (Canada), the Nikkei Dow Jones (Japan) and the Capitalization Weighted Stock Index (Taiwan). They have been chosen among the 8 markets studied in [5] because they exhibit the main evolution patterns that can be found in the set of 8 markets. The aim of GP is to induce the most profitable strategy, measured by the accumulated return, for trading the stock exchange index. The use of short selling is possible.We adopt what is done classically in literature in terms of datapreprocessing and use normalized data that is obtained by dividing each day's price by a 250day moving average^{,5}.In a way similar to what is done usually, we subdivide the whole dataset into three sections: the training, validation and outofsample test periods. For each stock index considered, 3 different outofsample test periods of 2 years each (i.e. 19992000, 20012002, 20032004) follow a 3 year validation and a 3year training period. In the following, the term market refers to a stock exchange during a specific outofsample period. For instance, market Canada1 (C1 for short) is the TSE 300 during the outofsample period 19992000.Hypothesis testing is performed with the Student's ttest at a 95% confidence level.The samples for statistics are constituted of the results of 50 GP runs, 50 runs of equivalent search intensity random search with training and validation (ERS) and 100 runs of lottery trading (LT).
In 4 out of the 9 markets (i.e. C3, J2, T1, T3), there is evidence that there is something to learn from the training data (case 3 in Table 1). This is consistent with [5] where GP2 performs outstandingly on these 4 markets (respective total return: 0.34, 0.17, 0.52, 0.27 with GP1). In markets C1, J3 and T2, pretests 4 and 5 suggest to us that there is nothing to learn (case 1). Except for C1, GP2 also performs poorly (0.18 for J3 and 0.05 for T2). Finally, in the 3 markets where GP2 is shown to beat ERS _{(fl,,0} is rejected in favor of _{fl,,,} for J1, J2 and T1), the GP results are very good : both GP1 and GP2 produce positive returns and outperform the buyandhold strategy.
Although more comprehensive tests are needed, the experiments
conducted here show some preliminary evidence that the proposed
pretests possess some predictive ability. Indeed, when the
outcome is "nothing to learn", the two GP
implementation perform very poorly (except in one case). On the contrary, when the pretests suggest that there is something to learn, at least one implementation does well, and when GP2 is more efficient than random search (i.e. ERS), GP1 from [5] is efficient too.In the light of the pretests, we should also conclude that our GP implementation (i.e. GP2) is more efficient than ERS (GP2 outperforms ERS in 3 markets while ERS never beats GP2 with statistical significance).However, in our experiments, searching trading rules at random, with the same set of functions and terminals as used in GP, is usually enough to come up with trading systems that outperform lottery trading when GP does as well. This suggests to us that GP2 may only be able to take advantage of "simple" regularities in the data.
5 Conclusions
The main purpose of this paper is to enrich the earlier research on Genetic Programming (GP) induced markettiming decisions by proposing pretests aiming to shed light on the GP results. In actual fact, in the literature, the results of applying GP for markettiming decisions are typically not very convincing, but the investigators always suggest the possibility of further improvements.If the investigators can first convince that there is something to learn and that GP is suitable for that task, then their conclusion would be less vague and uncertain. We propose here a series of pretests, where GP is tested against a random behavior (lottery trading) and against strategies created at random (zerointelligence strategies), that aim to answer these two crucial questions. Of course there is the risk of getting a wrong pretest result and the possible reasons why GP may have failed should be thoroughly investigated before drawing a conclusion. But, in the end, analyzing the results in the light of the pretests should help draw more finegrained conclusions.
Acknowledgment 
This research was conducted when the second author (Nicolas Navet) was visiting researcher at the AIECON Research Center, National Chengchi University (NC CU), Taipei, Taiwan. The financial support from the AIECON Research Center as well as NCCU and INRIA is greatly acknowledged. The authors would like also to acknowledge the grant from National Science Concil #952415H004002MY3. 
References 
1. J.A. Giles and D.E.A. Giles: Pretest Estimation and Testing in Econometrics: Recent Developments. Journal of Economic Surveys 7(2) (1993) 14597 2. D. Danilov and J.R. Magnus: Forecast Accuracy After Pretesting with an Application to the Stock Market. Journal of Forecasting 23 (2004) 251274 3. R. Sullivan and A. Timmermann and H. White: DataSnooping, Technical Trading Rule Performance, and the Bootstrap. Journal of Finance 54 (1999) 16471692 4. M.A. Kaboudan: A Measure of Time Serie's Predictability Using Genetic Programming Applied to Stock Returns. Journal of Forecasting 18 (1999) 345357 5. S.H. Chen and T.W. Kuo and K.M. Hoi: Genetic Programming and Financial Trading: How Much about "What we Know". In: Handbook of Financial Engineering. Kluwer Academic Publishers (2006) Forthcoming. 6. S.H. Chen and T.Z. Kuo: Overfitting or Poor Learning: A Critique of Current Financial Applications of GP. In SpringerVerlag, ed.: Proceedings of the Sixth European Conference on Genetic Programming (EuroGP2003). Number 2610 in LNCS (2003) 3446 7. F. Allen and R. Karjalainen: Using Genetic Algorithms to Find Technical trading rules. Journal of Financial Economics 51 (1999) 245271 C. Neely and P. Weller and R. Dittmar: Is Technical Analysis in the Foreign Exchange Market Profitable? A Genetic Programming Approach. Journal of Financial and Quantitative Analysis 32(4) (1997) 405427 
Footnotes 
13 NLT has to be even since a "buy" transaction is
followed by a sell transaction and no positions are left
open.
14 Although both programs have been developed by members of the AIECON Research Center, they have not been written by the same persons and do not share a single line of code. Furthermore GP2, which is based on the OpenBeagle library implements stronglytyped GP.
