# Pretests for genetic-programming evolved trading programs: 'zero-intelligence' strategies and lottery trading - Part 2

## 3 What does the pretest tell us?

The outcomes of the pretests provide us with answers to the following two ques­tions: is there something essential to learn on the training data that can be of interest for the out-of-sample period ? Does the GP implementation show some evidence of effectiveness in that task ? Clearly, before actually trading with GP evolved programs, these two questions must be answered with reasonable cer­tainty; the rest of this section explains how pretests may help in that regard.

## 3.1 Question 1: is there something to learn?

The null hypothesis fl4,0 corresponding to pretest 4 has been presented in Sec­tion 2.2. We introduce pretest 5 that will be used in conjunction with pretest 4.

### Pretest 5: equivalent intensity random search with training and validation versus lottery trading.

Here, we compare lottery trading to a random search with training and validation, and a search intensity equivalent to the one used for GP in pretest 4. The null hypothesis fl5,0 is that the equivalent in­tensity random does not outperform lottery trading on the out-of-sample data. Depending on the validity of fl4,0and fl5,0, we can draw the conclusions that are summarized in Table 1.

 fl4,0 fl5,0 Interpretation case 1 -R -R there is evidence that there is nothing to learn case 2 R -R there may be something to learn (weak certainty) case 3 R R there is evidence that there is something to learn case 4 -R R there may be something to learn (weak certainty)

Table 1. Information drawn from the outcomes of pretest 4 and pretest 5 (-R means that the null hypothesis fli,0 cannot rejected while R means that the hypothesis is rejected in favour of the alternative hypothesis).

In case 1, best solutions on the training intervals, obtained with 2 different search algorithms, do not perform better than lottery trading on the out-of­sample period. This suggests to us than there is nothing to learn. In case 2, GP outperforms lottery trading but random search does not; it is possible that there is something to learn, but that the selected random rules do not have a sufficient predictive ability. Anyway, this lead us to a less certain conclusion than in case 3 where both search techniques outperform lottery trading. Finally case 4 is a special case where random search performs better than lottery trading but GP does not. The whole evolutionary process of GP has thus a detrimental effect and a possible explanation is that GP-induced solutions overfit the training data.

## 3.2 Question 2: is the GP machinery working properly?

The second question we ought to ask is whether GP is effective. Of course, this cannot be answered with the data at hand if pretests 4 and 5 have shown that there is nothing to be learned (case 1 in Table 1). In addition, in case 4 of Table 1, we already know that GP is not efficient since, by transitivity, it is outperformed by the random search-based algorithm. Thus, the only two cases where one really needs to proceed to further examination are case 2 and case 3.The validity of the null hypothesis fl1,0, which can be tested with pretest 1, gives a helpful insight into the answer: only if fl1,0 should be rejected can we conclude that GP shows some real effectiveness. We would like to stress that rejecting fl1,0 is far from implying profitability, but beating a mere random search algorithm on a difficult problem with an infinite search space is the bare minimum one can expect from GP.

## 4 Experiments

The aim of the experiments is to evaluate the extent to which the pretests proposed are reliable. The methodology adopted here is to check if the outcomes of the pretests are consistent with results already published in the literature. We call GP2 the GP implementation developed for this study and GP1 the software,4 used in [5], which will constitute our benchmark. The GP2 control parameters, as close as possible to the ones used in [5] for GP1, are summarized in Table 1 (Appendix A).

The traded instruments are the indexes of 3 stock exchanges: the TSE 300 (Canada), the Nikkei Dow Jones (Japan) and the Capitalization Weighted Stock Index (Taiwan). They have been chosen among the 8 markets studied in [5] because they exhibit the main evolution patterns that can be found in the set of 8 markets. The aim of GP is to induce the most profitable strategy, measured by the accumulated return, for trading the stock exchange index. The use of short selling is possible.We adopt what is done classically in literature in terms of data-preprocessing and use normalized data that is obtained by dividing each day's price by a 250-day moving average,5.In a way similar to what is done usually, we subdivide the whole dataset into three sections: the training, validation and out­of-sample test periods. For each stock index considered, 3 different out-of-sample test periods of 2 years each (i.e. 1999-2000, 2001-2002, 2003-2004) follow a 3- year validation and a 3-year training period. In the following, the term market refers to a stock exchange during a specific out-of-sample period. For instance, market Canada-1 (C1 for short) is the TSE 300 during the out-of-sample period 1999-2000.Hypothesis testing is performed with the Student's t-test at a 95% confidence level.The samples for statistics are constituted of the results of 50 GP runs, 50 runs of equivalent search intensity random search with training and validation (ERS) and 100 runs of lottery trading (LT).

In 4 out of the 9 markets (i.e. C3, J2, T1, T3), there is evidence that there is something to learn from the training data (case 3 in Table 1). This is consistent with [5] where GP2 performs outstandingly on these 4 markets (respective total return: 0.34, 0.17, 0.52, 0.27 with GP1). In markets C1, J3 and T2, pretests 4 and 5 suggest to us that there is nothing to learn (case 1). Except for C1, GP2 also performs poorly (-0.18 for J3 and -0.05 for T2). Finally, in the 3 markets where GP2 is shown to beat ERS (fl,,0 is rejected in favor of fl,,, for J1, J2 and T1), the GP results are very good : both GP1 and GP2 produce positive returns and outperform the buy-and-hold strategy.

Although more comprehensive tests are needed, the experiments conducted here show some preliminary evidence that the proposed pretests possess some predictive ability. Indeed, when the outcome is "nothing to learn", the two GP

implementation perform very poorly (except in one case). On the contrary, when the pretests suggest that there is something to learn, at least one implementation does well, and when GP2 is more efficient than random search (i.e. ERS), GP1 from [5] is efficient too.In the light of the pretests, we should also conclude that our GP implementation (i.e. GP2) is more efficient than ERS (GP2 outperforms ERS in 3 markets while ERS never beats GP2 with statistical significance).How­ever, in our experiments, searching trading rules at random, with the same set of functions and terminals as used in GP, is usually enough to come up with trading systems that outperform lottery trading when GP does as well. This suggests to us that GP2 may only be able to take advantage of "simple" regularities in the data.

## 5 Conclusions

The main purpose of this paper is to enrich the earlier research on Genetic Pro­gramming (GP) induced market-timing decisions by proposing pretests aiming to shed light on the GP results. In actual fact, in the literature, the results of applying GP for market-timing decisions are typically not very convincing, but the investigators always suggest the possibility of further improvements.If the investigators can first convince that there is something to learn and that GP is suitable for that task, then their conclusion would be less vague and uncertain. We propose here a series of pretests, where GP is tested against a random behav­ior (lottery trading) and against strategies created at random (zero-intelligence strategies), that aim to answer these two crucial questions. Of course there is the risk of getting a wrong pretest result and the possible reasons why GP may have failed should be thoroughly investigated before drawing a conclusion. But, in the end, analyzing the results in the light of the pretests should help draw more fine-grained conclusions.

### Acknowledgment

This research was conducted when the second author (Nico­las Navet) was visiting researcher at the AI-ECON Research Center, National Chengchi University (NC CU), Taipei, Taiwan. The financial support from the AI-ECON Research Center as well as NCCU and INRIA is greatly acknowl­edged. The authors would like also to acknowledge the grant from National Science Concil #95-2415-H-004-002-MY3.

### References

1. J.A. Giles and D.E.A. Giles: Pre-test Estimation and Testing in Econometrics: Recent Developments. Journal of Economic Surveys 7(2) (1993) 145-97

2. D. Danilov and J.R. Magnus: Forecast Accuracy After Pretesting with an Applica­tion to the Stock Market. Journal of Forecasting 23 (2004) 251-274

3. R. Sullivan and A. Timmermann and H. White: Data-Snooping, Technical Trading Rule Performance, and the Bootstrap. Journal of Finance 54 (1999) 1647-1692

4. M.A. Kaboudan: A Measure of Time Serie's Predictability Using Genetic Program­ming Applied to Stock Returns. Journal of Forecasting 18 (1999) 345-357

5. S.-H. Chen and T.-W. Kuo and K.-M. Hoi: Genetic Programming and Financial Trading: How Much about "What we Know". In: Handbook of Financial Engineer­ing. Kluwer Academic Publishers (2006) Forthcoming.

6. S.-H. Chen and T.-Z. Kuo: Overfitting or Poor Learning: A Critique of Current Financial Applications of GP. In Springer-Verlag, ed.: Proceedings of the Sixth European Conference on Genetic Programming (EuroGP-2003). Number 2610 in LNCS (2003) 34-46

7. F. Allen and R. Karjalainen: Using Genetic Algorithms to Find Technical trading rules. Journal of Financial Economics 51 (1999) 245-271

C. Neely and P. Weller and R. Dittmar: Is Technical Analysis in the Foreign Ex­change Market Profitable? A Genetic Programming Approach. Journal of Financial and Quantitative Analysis 32(4) (1997) 405-427

### Footnotes

13 NLT has to be even since a "buy" transaction is followed by a sell transaction and no positions are left open.

14 Although both programs have been developed by members of the AI-ECON Re­search Center, they have not been written by the same persons and do not share a single line of code. Furthermore GP2, which is based on the Open-Beagle library implements strongly-typed GP.

15 See [5] for a discussion about how non-normalized data affects the performance of GP.