The Gateway to Algorithmic and Automated Trading

Hypothesis Tests and Confidence Intervals

David Aronson

The following excerpt is from Chapter 5 of David Aronson's recently published book "Evidence-Based Technical Analysis". Together with Chapters 4 and 6 of the book it addresses aspects of statistics that are particularly relevant to evidence-based (as opposed to subjective) technical analysis.


Statistical inference encompasses two procedures: hypothesis testing and parameter estimation. Both are concerned with the unknown value of a population parameter. A hypothesis test determines if a sample of data is consistent with or contradicts a hypothesis about the value of a population parameter, for example, the hypothesis

To buy a copy of "Evidence-Based Technical Analysis" at a 37% discount, please click here.

that its value is less than or equal to zero. The other inference procedure, parameter estimation, uses the information in a sample to determine the approximate value of a population parameter.1 Thus, a hypothesis test tells us if an effect is present or not, whereas an estimate tells us about the size of an effect.

In some ways both forms of inference are similar. Both attempt to draw a conclusion about an entire population based only on what has been observed in a sample drawn from the population. In going beyond what is known, both hypothesis testing and parameter estimation take the inductive leap from the certain value of a sample statistic to the uncertain value of a population parameter. As such, both are subject to error.

However, important differences distinguish parameter estimation from hypothesis testing. Their goals are different. The hypothesis test evaluates the veracity of a conjecture about a population parameter leading to an acceptance or rejection of that conjecture. In contrast, estimation is aimed at providing a plausible value or range of values for the population parameter. In this sense, estimation is a bolder endeavor and offers potentially more useful information. Rather than merely telling us whether we should accept or reject a specific claim such as a rule's average return is less than or equal to zero, estimation approximates the average return and provides a range of values within which the rule's true rate of return should lie at a specified level of probability. For example, it may tell us that the rule's estimated return is 10 percent and there is a 95 percent probability that it falls within the range of 5 percent to 15 percent. This statement contains two kinds of estimates; a point estimate, that the rule's return is 10 percent, and an interval estimate, that the return lies in the range 5 percent to 15 percent. The rule studies discussed in Part Two use estimation as an adjunct to the hypothesis tests.


If a rule has been profitable in a sample of historical data, this sample statistic is an indisputable fact. However, from this fact, what can be inferred about the rule's future performance? Is it likely to be profitable because it possesses genuine predictive power or are profits unlikely because its past profits were due to chance? The hypothesis test is a formal and rigorous inference procedure for deciding which of these alternatives is more likely to be correct, and so can help us decide if it would be rational to use the rule for actual trading in the future.

Confirmatory Evidence: It's Nice, It's Necessary,
but It Ain't Sufficient

Chapter 2 pointed out that informal inference is biased in favor of confirmatory evidence. That is to say, when we use common sense to test the validity of an idea, we tend to look for confirmatory evidence-facts consistent with the idea's truth. At the same time, we tend to ignore or give too little weight to contradictory evidence. Common sense tells us, and rightly so, that if the idea is true, instances where the idea worked (confirmatory evidence) should exist. However, informal inference makes the mistake of assuming that confirmatory evidence is sufficient to establish its truth. This is a logical error. Confirmatory evidence does not compel the conclusion that the idea is true. Because it is consistent with the idea's truth, it merely allows for the possibility that the idea is true.

The crucial distinction between necessary evidence and sufficient evidence was illustrated in Chapter 4 with in the following example. Suppose we wish to test the truth of the assertion: The creature I observe is a dog. We observe that the creature has four legs (the evidence). This evidence is consistent with (i.e., confirmatory of) the creature being a dog. In other words, if the creature is a dog, then it will necessarily have four legs. However, four legs are not sufficient evidence to establish that the creature is a dog. It may very well be another four-legged creature (cat, rhino, and so forth).

Popular articles on TA will often try to argue that a pattern has predictive power by presenting instances where the pattern made successful predictions. It is true that, if the pattern has predictive power, then there will be historical cases where the pattern gave successful predictions. However, such confirmatory evidence, while necessary, is not sufficient to logically establish that the pattern has predictive power. It is no more able to compel the conclusion that the pattern has predictive power than the presence of four legs is able to compel the conclusion that the creature is a dog.

To argue that confirmatory instances are sufficient commits the fallacy of affirming the consequent.

If p is true, then q is true.
q is true.

Invalid Conclusion: Therefore, p is true.

If the pattern has predictive power, then past examples of success should exist.
Past examples of success exist, and here they are.
Therefore, the pattern has predictive power.

The logical basis of the hypothesis test is falsification of the consequent. As such, it is a potent antidote to the confirmation bias of informal inference and an effective preventative of erroneous belief.

What Is a Statistical Hypothesis?

A statistical hypothesis is a conjecture about the value of a population parameter. Often this is a numerical characteristic, such as the average return of a rule. The population parameter's value is unknown because it is unobservable. For reasons previously discussed, it is assumed to have value equal to or less than zero.

What an observer does know is the value of a sample statistic for a sample that has been drawn from the population. Thus, the observer is faced with a question: Is the observed value of the sample statistic consistent with the hypothesized value of the population parameter? If the observed value is close to the hypothesized value, the reasonable inference would be that the hypothesis is correct. If, on the other hand, the value of the sample value is far away from the hypothesized value, the truth of the hypothesis is called into question.

Close and far are ambiguous terms. The hypothesis test quantifies these terms making it possible to a draw a conclusion about the veracity of the hypothesis. The test's conclusion is typically given as a number between 0 and 1.0. This number indicates the probability that the observed value of the sample statistic could have occurred by chance under the condition that (given that or assuming that) the hypothesized value is true. For example, suppose it is hypothesized that a rule's expected return is equal to zero, but the back test produced a return of +20 percent. The conclusion of the hypothesis test may say something like the following: If the rule's expected rate of return were truly equal to zero, there is a 0.03 probability that the back-tested return could be equal to or greater than +20 percent due to chance. Because there is only a 3 percent probability that a 20 percent return could have occurred by chance if the rule were truly devoid of predictive power, then we can be quite confident that the rule was not simply lucky in the back test.

Falsifying a Hypothesis with Improbable Evidence

A hypothesis test begins by assuming that the hypothesis being tested is true. Based on this assumption, predictions are deduced from the hypothesis about the likelihood of various new observations. In other words, if the hypothesis is true, then certain outcomes would be probable to occur whereas other outcomes would be improbable. Armed with this set of expectations an observer is in a position to compare the predictions with subsequent observations. If predictions and observations agree, there is no reason to question the hypothesis. However, if low probability outcomes are observed-outcomes that would be inconsistent with the truth of the hypothesis-the hypothesis is deemed falsified. Thus, it is the occurrence of unexpected evidence that is the basis for refuting a hypothesis. Though this line of reasoning is counterintuitive, it is logically correct (denial of the consequent) and extremely powerful. It is the logical basis of scientific discovery.

To give a concrete example, suppose I view myself as an excellent social tennis player. My hypothesis is David Aronson is an excellent social tennis player. I join a tennis club with members whose age and years of play are similar to mine. On the basis of my hypothesis, I confidently predict to other club members that I will win at least three-quarters of my games (predicted win rate = 0.75). This prediction is merely a deductive consequence of my hypothesis. I test the hypothesis by keeping track of my first 20 games. After 20 games I am shocked and disappointed. Not only have I not scored a single victory (observed win rate = 0), but most losses have been by wide margins. This outcome is clearly inconsistent with the prediction deduced from my hypothesis. Said differently, my hypothesis implied that this evidence had a very low probability of occurrence. Such surprising evidence forcefully calls for a revision (falsification) of my hypothesis. Unless I prefer feel-good delusions to observed evidence, it is time to abandon my delusions of tennis grandeur.2

In the preceding situation, the evidence was overwhelmingly clear. I lost every one of 20 games. However, what if the evidence had been ambiguous? Suppose I had won two-thirds of my games. An observed win rate of 0.66 is below the predicted win rate of 0.75 but not dramatically so. Was this merely a random negative deviation from the predicted win rate or was the deviation of sufficient magnitude to indicate the hypothesis about my tennis ability was faulty? This is where statistical analysis becomes necessary. It attempts to answer the question: Was the difference between the observed win rate (0.66) and the win rate predicted by my hypothesis (0.75) large enough to raise doubts about the veracity of the hypothesis? Or, alternatively: Was the difference between 0.66 and 0.75 merely random variation in that particular sample of tennis matches? The hypothesis test attempts to distinguish prediction errors that are small enough to be the result of random sampling from errors so large that they indicate a faulty hypothesis.

Dueling Hypotheses: The Null Hypothesis versus
the Alternative Hypothesis

A hypothesis test relies on the method of indirect proof. That is, it establishes the truth of something by showing that something else is false. Therefore, to prove the hypothesis that we would like to demonstrate as correct, we show that an opposing hypothesis is incorrect. To establish that hypothesis A is true, we show that the opposing hypothesis Not-A is false.

A hypothesis test, therefore, involves two hypotheses. One is called the null hypothesis and the other the alternative hypothesis. The names are strange, but they are so well entrenched that they will be used here. The alternative hypothesis, the one the scientist would like to prove, asserts the discovery of important new knowledge. The opposing or null hypothesis simply asserts that nothing new has been discovered. For example, Jonas Salk, inventor of the polio vaccine, put forward the alternative hypothesis that his new vaccine would prevent polio more effectively than a placebo. The null hypothesis asserted that the Salk vaccine would not prevent polio more effectively than a placebo. For the TA rules tested in this book, the alternative hypothesis asserts the rule has an expected return greater than zero. The null hypothesis asserts that the rule does not have an expected return greater than zero.

For purposes of brevity, I will adopt the conventional notation: HA for the alternative hypothesis, and H0 for the null hypothesis. A way to remember this is the null hypothesis asserts that zero new knowledge has been discovered, thus the symbol H0.

It is crucial to the logic of a hypothesis test that HA and H0 be defined as mutually exclusive and exhaustive propositions. What does this mean? Two propositions are said to be exhaustive if, when taken together, they cover all possibilities. HA and H0 cover all possibilities. Either the polio vaccine has a preventive effect or it does not. There is no other possibility. Either a TA rule generates returns greater than zero or it does not.

The two hypotheses must also be defined as mutually exclusive. Mutually exclusive propositions cannot both be true at the same time, so if H0 is shown to be false, then HA must be true and vice versa. By defining the hypotheses as exhaustive and mutually exclusive statements, if it can be shown that one hypothesis is false, then we are left with the inescapable


conclusion that the other hypothesis must be true. Proving truth in this fashion is called the method of indirect proof. These concepts are illustrated in the Figure 5.1.


Two aspects of the hypothesis test warrant explanation. First, why the test is focused on the null hypothesis. Second, why the null hypothesis is assumed to be true rather than the alternative hypothesis. This section explains the reasoning behind both aspects.

To read the entire chapter please click the Open as PDF button at the top of the page. click here to return to the top of the page

To buy a copy of "Evidence-Based Technical Analysis" at a 37% discount, please click here.