Rename the variable wbgi_pse into pol.stability.
To do this we need to do two things. First, load the Stata data set and for that we need the foreign
library. Second, we need to load the dplyr
library in order to use rename()
.
library(foreign)
library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
df <- read.dta("http://uclspp.github.io/PUBLG100/data/QoG2012.dta")
df <- rename(df, pol.stability = wbgi_pse)
Check whether political stability is different in countries that were former colonies (former_col == 1
).
The variable former_col
is binary, while pol.stability
is continuous. Therefore, we use the t-test. Before we do this we declare former_col
to be a factor variable. This will make it easier for you to interpret which group has the larger and which has the smaller mean.
df$former_col <- factor(df$former_col, labels = c("not ex colony", "ex colony"))
t.test(df$pol.stability ~ df$former_col, mu=0, alt="two.sided", conf=0.95)
Welch Two Sample t-test
data: df$pol.stability by df$former_col
t = 3.4674, df = 139.35, p-value = 0.0006992
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
0.2224004 0.8125053
sample estimates:
mean in group not ex colony mean in group ex colony
0.2858409 -0.2316120
Choose an alpha level of .01.
The alpha level is the probability that we would see this data given that the null hypothesis was true. If this is very unlikely, we reject the null hypothesis. So, given an alpha level, your confidence-level that the alternative hypothesis is true is 1 - alpha level
. This means with an alpha level of 0.01, you would set the argument conf = 0.99
t.test(df$pol.stability ~ df$former_col, mu=0, alt="two.sided", conf=0.99)
Welch Two Sample t-test
data: df$pol.stability by df$former_col
t = 3.4674, df = 139.35, p-value = 0.0006992
alternative hypothesis: true difference in means is not equal to 0
99 percent confidence interval:
0.1277220 0.9071837
sample estimates:
mean in group not ex colony mean in group ex colony
0.2858409 -0.2316120
We claim the difference in means is 0.3. Check this hypothesis.
Faced with the claim that the difference is 0.3, our alternative hypothesis would be that it is not 0.3. Therefore, the null hypothesis becomes that the difference in means is 0.3 and we check against that. To do this we set the argument mu=.3
. If you run the code below you will see that the confidence interval includes 0.3. This means, we cannot reject the hypothesis that the difference in means is 0.3.
t.test(df$pol.stability ~ df$former_col, mu=0.3, alt="two.sided", conf=0.99)
Welch Two Sample t-test
data: df$pol.stability by df$former_col
t = 1.4571, df = 139.35, p-value = 0.1473
alternative hypothesis: true difference in means is not equal to 0.3
99 percent confidence interval:
0.1277220 0.9071837
sample estimates:
mean in group not ex colony mean in group ex colony
0.2858409 -0.2316120
Rename the variable lp_lat_abst
into latitude.
df <- rename(df, latitude = lp_lat_abst)
Check whether latitude and political stability are correlated.
To do this you would first find out how the variables are scaled. We already know political stability is continuous. Our new variable latitude
measures the distance to the equator, so a good guess is that is interval scaled. To be sure let's check using a frequency table.
table(df$latitude)
0 0.011111100204289 0.0135556003078818
1 5 1
0.0138889001682401 0.0222222004085779 0.0255555994808674
1 3 1
0.0350000001490116 0.0366666987538338 0.0444444008171558
1 1 2
0.0477778017520905 0.0483332984149456 0.0555556006729603
1 1 2
0.0666666999459267 0.0700000002980232 0.0727778002619743
3 1 1
0.0777778029441833 0.0888888984918594 0.092222198843956
2 6 1
0.100000001490116 0.103333301842213 0.111111097037792
2 1 5
0.122222200036049 0.125555604696274 0.133333295583725
2 1 1
0.134111106395721 0.134444400668144 0.13666670024395
1 1 1
0.144444495439529 0.145555600523949 0.146111100912094
4 1 1
0.147555604577065 0.147777795791626 0.148333296179771
1 2 1
0.150000005960464 0.150333300232887 0.155555605888367
1 1 1
0.166666701436043 0.170000001788139 0.177777796983719
7 1 4
0.188888907432556 0.189222201704979 0.190555602312088
2 1 1
0.191111102700233 0.200000002980232 0.201666697859764
1 2 2
0.211111098527908 0.222222194075584 0.224111095070839
2 5 1
0.233333304524422 0.236666694283485 0.244444400072098
1 1 3
0.255555599927902 0.258888900279999 0.26666671037674
2 1 2
0.268333286046982 0.277777791023254 0.281111091375351
1 2 1
0.288888901472092 0.292222201824188 0.300000011920929
1 1 2
0.303333312273026 0.311111092567444 0.322222203016281
1 2 1
0.325555503368378 0.333333313465118 0.344444513320923
2 2 1
0.347777813673019 0.355555593967438 0.366666704416275
1 2 3
0.372222185134888 0.377777814865112 0.388888895511627
1 2 3
0.394444406032562 0.400000005960464 0.411111086606979
1 1 1
0.422222197055817 0.433333307504654 0.436666697263718
1 3 1
0.444444388151169 0.447777807712555 0.455555588006973
4 1 4
0.461111098527908 0.466666698455811 0.469999998807907
1 1 1
0.472222208976746 0.477777808904648 0.482666611671448
1 1 1
0.488888889551163 0.501111090183258 0.511111080646515
2 1 4
0.522222220897675 0.523333311080933 0.52444452047348
3 1 1
0.533333420753479 0.537777781486511 0.544444382190704
1 1 1
0.549444377422333 0.561111092567444 0.566666722297668
2 1 1
0.577777802944183 0.581111073493958 0.588888883590698
1 1 2
0.600000023841858 0.622222185134888 0.633333325386047
1 2 1
0.655555486679077 0.666666686534882 0.688888907432556
1 2 2
0.711111128330231 0.722222208976746
1 1
We see that the variable ranges from zero to one and it is at least interval scaled. In case you wonder why zero to one, latitudes have been divided by 90. We now know, both variables are at least interval scaled. We check for a relationship visually first.
plot(df$latitude, df$pol.stability)
You may have spot a positive correlation. Positive in the sense that larger distances to the equator are related to more political stability. We now apply the appropriate test statistic, Pearson's R:
r <- cor.test(df$latitude, df$pol.stability, use="complete.obs", conf.level = 0.99)
r
Pearson's product-moment correlation
data: df$latitude and df$pol.stability
t = 5.8247, df = 185, p-value = 2.492e-08
alternative hypothesis: true correlation is not equal to 0
99 percent confidence interval:
0.2224487 0.5413167
sample estimates:
cor
0.3936597