16 Hypothesis tests
16.1 For two categorical variables
A common procedure is to test for an association between two categorical variables. We can illustrate this procedure using a tabulation with a \(\chi{}^2\) statistic. (Of course, the variable sex is not necessarily binary valued.)
table sex class, expected chi2
16.2 Exercise
Create the table with \(\chi{}^2\) statistic and expected values as above. Should you reject the H0 that sex
and class
are not associated?
16.3 For one continuous and one categorical variable of two levels
If we have one continuous numeric variable and one two level categorical variable (such as employed vs unemployed) that would divide our data into two groups, we can ask ourselves whether the mean of the continuous variable differs for the groups (with H0 being that they do not).
If our two groups are independent, then we must first ask if the variance in the data is more or less equal between groups. The null hypothesis is that the variances are equal. This is tested by the comparison of variances using Stata’s robvar
command. We can test the maths
scores by sex
in our data
robvar maths, by(sex)
Knowing whether or not we are dealing with groups displaying (more or less) equal variance in the variable of interest, we can go on to conduct an independent samples t-test. The code is
ttest maths, by(sex)
(Assuming that we have interpreted the results of robvar
to mean the variance in maths for the two groups is equal).
16.4 Exercise
Run the robvar
procedure above but for the history
and sex
variables. What are the three W statistics produced? Which of them tests that the variances are equal for a comparison of means? Is there strong enough evidence in this case to reject the null hypothesis?
Use the ttest
command to test the null hypothesis that
\[\mu{}\ english _{\ female \ students} = \mu \ english _{\ male\ students}\]
What conclusion do you draw?
16.5 The paired samples ttest
We can also compare the same group of subjects on two measures to see if the means differ. In this case there is no need to check the variances before conducting the test. For example we could test whether or not mean scores in English and History differ (with the null hypothesis that they do not)
ttest english == history
Using this procedure, how do English scores compare to History scores and how do English scores compare to Mathematics scores?
16.6 Once continuous and one categorical variable of more than two levels
We can compare the level avxm
by teacher
, this is to say test the null hypothesis
\[\mu{}\ avxm \ _{teacher \ one} = \mu{}\ avxm \ _{teacher \ two} = \mu{}\ avxm \ _{teacher \ three} \]
16.6.1 One way ANOVA and post-hoc testing
The Stata command to test the null hypothesis above is
oneway maths teacher, bonferroni tabulate
This command produces summary statistics the ANOVA statistic F, its associated probability, and other quantities calculated as part of the ANOVA. In the version given above, we have included a tabulation of pairwise comparisons using the bonferroni correction. We can separately examine the pairwise comparisons if we wish with
pwmean avxm, over(teacher) mcompare(bonferroni) effects
This method does not display the ANOVA table itself and the mcompare() option gives us access to a slightly different range of correction options.
16.8 Correlation
Analysis of two continuous variables begins with calculating the Pearson Correlation Coefficient: R. This statistic ranges from
- -1 indicating an inverse or negative correlation
- 0 indicating no correlation
- +1 indicating a positive correlation
We should take note that a correlation has not only magnitude and direction, but that there is an associated hypothesis test: the the true correlation is 0. This test gives a p value associated with R.
The code to compute R in Stata is
correlate var1 var2
This computes R for var1
and var2
. If you do not specify a variable list, Stata computes correlations between all non-string variables in your data set.
16.9 Exercise
Compute Pearson correlations with significance values for the pairs
- english-maths
- english-history
Explain to your learning partner what the results mean to you.
16.9.1 Simple visualisation of correlation
The simplest way to visualise a correlation is with a scatter plot. You may wish to consider, based on your plans for further analysis which variable you wish to assign to which axis. To create a scatter plot you can start with
scatter english history
To add the trend line:
scatter english history || lfit english history
And add a confidence interval:
scatter english history || lfitci english history
Now you can add labels, titles and so on
twoway lfitci english history || scatter english history, jitter(5) ///
title("English as a predictor of History scores") ///
legend(off) ///
mcolor(red) ///
msymbol(Oh) ///
subtitle("For all students") ///
xtitle("English exam scores") ///
ytitle("History exam scores") ///
scheme(sj)
Stata has a very large range of graphing commands and options. While they are reasonably complicated, a good way to explore them is through this gallery.