# Hypothesis Tests and Confidence Intervals

#### David Aronson

## The following excerpt is from Chapter 5 of David Aronson's recently published book "Evidence-Based Technical Analysis". Together with Chapters 4 and 6 of the book it addresses aspects of statistics that are particularly relevant to evidence-based (as opposed to subjective) technical analysis.

## TWO TYPES OF STATISTICAL INFERENCE

Statistical inference encompasses two procedures: hypothesis testing and parameter estimation. Both are concerned with the unknown value of a population parameter. A hypothesis test determines if a sample of data is consistent with or contradicts a hypothesis about the value of a population parameter, for example, the hypothesis

## To buy a copy of "Evidence-Based Technical Analysis" at a 37% discount, please click here. |

that its value is less than or equal to zero. The other inference
procedure, parameter estimation, uses the information in a sample
to determine the approximate value of a population
parameter.^{1} Thus, a hypothesis test tells us if an
effect is present or not, whereas an estimate tells us about the
size of an effect.

In some ways both forms of inference are similar. Both attempt to draw a conclusion about an entire population based only on what has been observed in a sample drawn from the population. In going beyond what is known, both hypothesis testing and parameter estimation take the inductive leap from the certain value of a sample statistic to the uncertain value of a population parameter. As such, both are subject to error.

However, important differences distinguish parameter estimation
from hypothesis testing. Their goals are different. The
hypothesis test evaluates the veracity of a conjecture about a
population parameter leading to an acceptance or rejection of
that conjecture. In contrast, estimation is aimed at providing a
plausible value or range of values for the population parameter.
In this sense, estimation is a bolder endeavor and offers
potentially more useful information. Rather than merely telling
us whether we should accept or reject a specific claim such as a
rule's average return is less than or equal to zero, estimation
approximates the average return and provides a range of values
within which the rule's true rate of return should lie at a
specified level of probability. For example, it may tell us that
the rule's estimated return is 10 percent and there is a 95
percent probability that it falls within the range of 5 percent
to 15 percent. This statement contains two kinds of estimates; a
*point estimate*, that the rule's return is 10 percent,
and an *interval estimate*, that the return lies in the
range 5 percent to 15 percent. The rule studies discussed in Part
Two use estimation as an adjunct to the hypothesis tests.

## HYPOTHESIS TESTS VERSUS INFORMAL INFERENCE

If a rule has been profitable in a sample of historical data, this sample statistic is an indisputable fact. However, from this fact, what can be inferred about the rule's future performance? Is it likely to be profitable because it possesses genuine predictive power or are profits unlikely because its past profits were due to chance? The hypothesis test is a formal and rigorous inference procedure for deciding which of these alternatives is more likely to be correct, and so can help us decide if it would be rational to use the rule for actual trading in the future.

###
Confirmatory Evidence: It's Nice, It's Necessary,

but It Ain't Sufficient

Chapter 2 pointed out that informal inference is biased in favor of confirmatory evidence. That is to say, when we use common sense to test the validity of an idea, we tend to look for confirmatory evidence-facts consistent with the idea's truth. At the same time, we tend to ignore or give too little weight to contradictory evidence. Common sense tells us, and rightly so, that if the idea is true, instances where the idea worked (confirmatory evidence) should exist. However, informal inference makes the mistake of assuming that confirmatory evidence is sufficient to establish its truth. This is a logical error. Confirmatory evidence does not compel the conclusion that the idea is true. Because it is consistent with the idea's truth, it merely allows for the possibility that the idea is true.

The crucial distinction between necessary evidence and sufficient
evidence was illustrated in Chapter 4 with in the following
example. Suppose we wish to test the truth of the assertion:
*The creature I observe is a dog*. We observe that the
creature has four legs (the evidence). This evidence is
consistent with (i.e., confirmatory of) the creature being a dog.
In other words, if the creature is a dog, then it will
necessarily have four legs. However, four legs are not
*sufficient* evidence to establish that the creature is a
dog. It may very well be another four-legged creature (cat,
rhino, and so forth).

Popular articles on TA will often try to argue that a pattern has predictive power by presenting instances where the pattern made successful predictions. It is true that, if the pattern has predictive power, then there will be historical cases where the pattern gave successful predictions. However, such confirmatory evidence, while necessary, is not sufficient to logically establish that the pattern has predictive power. It is no more able to compel the conclusion that the pattern has predictive power than the presence of four legs is able to compel the conclusion that the creature is a dog.

To argue that confirmatory instances are sufficient commits the fallacy of affirming the consequent.

If p is true, then q is true.

q is true.

Invalid Conclusion:Therefore, p is true.

If the pattern has predictive power, then past examples of success should exist.

Past examples of success exist, and here they are.

Therefore, the pattern has predictive power.

The logical basis of the hypothesis test is falsification of the consequent. As such, it is a potent antidote to the confirmation bias of informal inference and an effective preventative of erroneous belief.

### What Is a Statistical Hypothesis?

A statistical hypothesis is a conjecture about the value of a population parameter. Often this is a numerical characteristic, such as the average return of a rule. The population parameter's value is unknown because it is unobservable. For reasons previously discussed, it is assumed to have value equal to or less than zero.

What an observer does know is the value of a sample statistic for
a sample that has been drawn from the population. Thus, the
observer is faced with a question: Is the observed value of the
sample statistic consistent with the hypothesized value of the
population parameter? If the observed value is *close* to
the hypothesized value, the reasonable inference would be that
the hypothesis is correct. If, on the other hand, the value of
the sample value is far away from the hypothesized value, the
truth of the hypothesis is called into question.

*Close* and *far* are ambiguous terms. The
hypothesis test quantifies these terms making it possible to a
draw a conclusion about the veracity of the hypothesis. The
test's conclusion is typically given as a number between 0 and
1.0. This number indicates the probability that the observed
value of the sample statistic could have occurred by chance under
the condition that (given that or assuming that) the hypothesized
value is true. For example, suppose it is hypothesized that a
rule's expected return is equal to zero, but the back test
produced a return of +20 percent. The conclusion of the
hypothesis test may say something like the following: *If the
rule's expected rate of return were truly equal to zero, there is
a 0.03 probability that the back-tested return could be equal to
or greater than +20 percent due to chance.* Because there is
only a 3 percent probability that a 20 percent return could have
occurred by chance if the rule were truly devoid of predictive
power, then we can be quite confident that the rule was not
simply lucky in the back test.

### Falsifying a Hypothesis with Improbable Evidence

A hypothesis test begins by assuming that the hypothesis being tested is true. Based on this assumption, predictions are deduced from the hypothesis about the likelihood of various new observations. In other words, if the hypothesis is true, then certain outcomes would be probable to occur whereas other outcomes would be improbable. Armed with this set of expectations an observer is in a position to compare the predictions with subsequent observations. If predictions and observations agree, there is no reason to question the hypothesis. However, if low probability outcomes are observed-outcomes that would be inconsistent with the truth of the hypothesis-the hypothesis is deemed falsified. Thus, it is the occurrence of unexpected evidence that is the basis for refuting a hypothesis. Though this line of reasoning is counterintuitive, it is logically correct (denial of the consequent) and extremely powerful. It is the logical basis of scientific discovery.

To give a concrete example, suppose I view myself as an excellent
social tennis player. My *hypothesis is David Aronson is an
excellent social tennis player.* I join a tennis club with
members whose age and years of play are similar to mine. On the
basis of my hypothesis, I confidently predict to other club
members that I will win at least three-quarters of my games
(predicted win rate = 0.75). This prediction is merely a
deductive consequence of my hypothesis. I test the hypothesis by
keeping track of my first 20 games. After 20 games I am shocked
and disappointed. Not only have I not scored a single victory
(observed win rate = 0), but most losses have been by wide
margins. This outcome is clearly inconsistent with the prediction
deduced from my hypothesis. Said differently, my hypothesis
implied that this evidence had a very low probability of
occurrence. Such surprising evidence forcefully calls for a
revision (falsification) of my hypothesis. Unless I prefer
feel-good delusions to observed evidence, it is time to abandon
my delusions of tennis grandeur.2

In the preceding situation, the evidence was overwhelmingly clear. I lost every one of 20 games. However, what if the evidence had been ambiguous? Suppose I had won two-thirds of my games. An observed win rate of 0.66 is below the predicted win rate of 0.75 but not dramatically so. Was this merely a random negative deviation from the predicted win rate or was the deviation of sufficient magnitude to indicate the hypothesis about my tennis ability was faulty? This is where statistical analysis becomes necessary. It attempts to answer the question: Was the difference between the observed win rate (0.66) and the win rate predicted by my hypothesis (0.75) large enough to raise doubts about the veracity of the hypothesis? Or, alternatively: Was the difference between 0.66 and 0.75 merely random variation in that particular sample of tennis matches? The hypothesis test attempts to distinguish prediction errors that are small enough to be the result of random sampling from errors so large that they indicate a faulty hypothesis.

###
Dueling Hypotheses: The Null Hypothesis versus

the Alternative Hypothesis

A hypothesis test relies on the method of indirect proof. That
is, it establishes the truth of something by showing that
something else is false. Therefore, to prove the hypothesis that
we would like to demonstrate as correct, we show that an opposing
hypothesis is incorrect. To establish that hypothesis A is true,
we show that the opposing hypothesis *Not-A* is false.

A hypothesis test, therefore, involves two hypotheses. One is
called the *null hypothesis* and the other the
*alternative hypothesis*. The names are strange, but they
are so well entrenched that they will be used here. The
alternative hypothesis, the one the scientist would like to
prove, asserts the discovery of important new knowledge. The
opposing or null hypothesis simply asserts that nothing new has
been discovered. For example, Jonas Salk, inventor of the polio
vaccine, put forward the alternative hypothesis that his new
vaccine would prevent polio more effectively than a placebo. The
null hypothesis asserted that the Salk vaccine would not prevent
polio more effectively than a placebo. For the TA rules tested in
this book, the alternative hypothesis asserts the rule has an
expected return greater than zero. The null hypothesis asserts
that the rule does *not* have an expected return greater
than zero.

For purposes of brevity, I will adopt the conventional notation:
H_{A} for the alternative hypothesis, and H_{0}
for the null hypothesis. A way to remember this is the null
hypothesis asserts that zero new knowledge has been discovered,
thus the symbol H_{0}.

It is crucial to the logic of a hypothesis test that HA and H0 be
defined as mutually exclusive and exhaustive propositions. What
does this mean? Two propositions are said to be
*exhaustive* if, when taken together, they cover all
possibilities. H_{A} and H_{0} cover all
possibilities. Either the polio vaccine has a preventive effect
or it does not. There is no other possibility. Either a TA rule
generates returns greater than zero or it does not.

The two hypotheses must also be defined as mutually exclusive.
Mutually exclusive propositions cannot both be true at the same
time, so if H_{0} is shown to be false, then
H_{A} must be true and vice versa. By defining the
hypotheses as exhaustive and mutually exclusive statements, if it
can be shown that one hypothesis is false, then we are left with
the inescapable

conclusion that the other hypothesis must be true. Proving truth in this fashion is called the method of indirect proof. These concepts are illustrated in the Figure 5.1.

### RATIONALE OF THE HYPOTHESIS TEST

Two aspects of the hypothesis test warrant explanation. First, why the test is focused on the null hypothesis. Second, why the null hypothesis is assumed to be true rather than the alternative hypothesis. This section explains the reasoning behind both aspects.