Using our model z.out
, where we included the square of GDP/capita, what is the effect of:
We will re-estimate the same model we had done in class. To do that we need to load foreign
, texreg
and Zelig
libraries. We also clear our workspace and load the data set. We create a subset of the loaded data set that includes only the two variables that we need for this exercise. We do na.omit()
over that subset to drop missing values but only from the variables that we include in our model.
library(foreign) # to load .dta files
library(texreg) # to print nice regression tables
Warning: package 'texreg' was built under R version 3.2.3
library(Zelig) # to model & illustrate uncertainty
# clear workspace
rm(list = ls())
# load data set from seminar page
data.set <- read.dta("http://uclspp.github.io/PUBLG100/data/QoG2012.dta")
# exercise subset
exercise.subset <- data.set[, c("undp_hdi", "wdi_gdpc")]
# removing NA's on our two variables of interest
exercise.subset <- na.omit(exercise.subset)
# running regression with squared GDP/captia
exercise1.reg <- zelig(undp_hdi ~ wdi_gdpc + I(wdi_gdpc^2), model = "ls",
data = exercise.subset, cite = FALSE)
screenreg(exercise1.reg)
===========================
Model 1
---------------------------
(Intercept) 0.53 ***
(0.01)
wdi_gdpc 0.00 ***
(0.00)
I(wdi_gdpc^2) -0.00 ***
(0.00)
---------------------------
AIC -282.52
BIC -269.93
Log Likelihood 145.26
Num. obs. 172
R^2 0.67
Adj. R^2 0.66
F statistic 170.11
===========================
*** p < 0.001, ** p < 0.01, * p < 0.05
Having done these preliminary tasks, we get to crux of the question. The model that we estimated is not linear because we included a quadratic term. The effect of a $10,000 increase on the quality of life, thus, depends on the initial level of wealth. We proceed like this:
## effect of an increase from $5,000 to $15,000
# set.seed() makes your results replicable, it does not
# matter which values you put inside
set.seed(123)
# setting covariates
set.gdp5000 <- setx(exercise1.reg, wdi_gdpc = 5000)
set.gdp15000 <- setx(exercise1.reg, wdi_gdpc = 15000)
# simulating
sim.5000.to.15000 <- sim(exercise1.reg, x = set.gdp5000, x1 = set.gdp15000)
# checking for the first differences
summary(sim.5000.to.15000)
Model: ls
Number of simulations: 1000
Values of X
(Intercept) wdi_gdpc I(wdi_gdpc^2)
2 1 5000 2.5e+07
attr(,"assign")
[1] 0 1 2
Values of X1
(Intercept) wdi_gdpc I(wdi_gdpc^2)
2 1 15000 2.25e+08
attr(,"assign")
[1] 0 1 2
Expected Values: E(Y|X)
mean sd 50% 2.5% 97.5%
2 0.644 0.008 0.644 0.628 0.661
Expected Values: E(Y|X1)
mean sd 50% 2.5% 97.5%
2 0.827 0.012 0.826 0.804 0.851
Predicted Values: Y|X
mean sd 50% 2.5% 97.5%
2 0.644 0.008 0.644 0.628 0.661
Predicted Values: Y|X1
mean sd 50% 2.5% 97.5%
2 0.827 0.012 0.826 0.804 0.851
First Differences: E(Y|X1) - E(Y|X)
mean sd 50% 2.5% 97.5%
2 0.182 0.011 0.182 0.161 0.204
## effect of an increase from 25,000 to 35,000
# setting covariates
set.gdp25000 <- setx(exercise1.reg, wdi_gdpc = 25000)
set.gdp35000 <- setx(exercise1.reg, wdi_gdpc = 35000)
# simulating
sim.25000.to.35000 <- sim(exercise1.reg, x = set.gdp25000, x1 = set.gdp35000)
# checking for the first differences
summary(sim.25000.to.35000)
Model: ls
Number of simulations: 1000
Values of X
(Intercept) wdi_gdpc I(wdi_gdpc^2)
2 1 25000 6.25e+08
attr(,"assign")
[1] 0 1 2
Values of X1
(Intercept) wdi_gdpc I(wdi_gdpc^2)
2 1 35000 1.225e+09
attr(,"assign")
[1] 0 1 2
Expected Values: E(Y|X)
mean sd 50% 2.5% 97.5%
2 0.94 0.015 0.94 0.909 0.97
Expected Values: E(Y|X1)
mean sd 50% 2.5% 97.5%
2 0.981 0.018 0.981 0.947 1.014
Predicted Values: Y|X
mean sd 50% 2.5% 97.5%
2 0.94 0.015 0.94 0.909 0.97
Predicted Values: Y|X1
mean sd 50% 2.5% 97.5%
2 0.981 0.018 0.981 0.947 1.014
First Differences: E(Y|X1) - E(Y|X)
mean sd 50% 2.5% 97.5%
2 0.041 0.009 0.042 0.024 0.059
# checking summary statistics of wealth
summary(exercise.subset$wdi_gdpc)
Min. 1st Qu. Median Mean 3rd Qu. Max.
226.2 1776.0 5493.0 10420.0 13840.0 63690.0
# checking 25th percentile and 75th percentile
quantile(exercise.subset$wdi_gdpc, probs = c(0.25, 0.75))
25% 75%
1776.097 13837.955
# looking at the distribution of wealth
hist(exercise.subset$wdi_gdpc)
# checking summary statistic of the dependent variable
summary(exercise.subset$undp_hdi)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.2730 0.5355 0.7505 0.6978 0.8350 0.9560
diff(range(exercise.subset$undp_hdi))
[1] 0.683
# the effect of increasing from $5000 to $15000
summary(sim.5000.to.15000)
Model: ls
Number of simulations: 1000
Values of X
(Intercept) wdi_gdpc I(wdi_gdpc^2)
2 1 5000 2.5e+07
attr(,"assign")
[1] 0 1 2
Values of X1
(Intercept) wdi_gdpc I(wdi_gdpc^2)
2 1 15000 2.25e+08
attr(,"assign")
[1] 0 1 2
Expected Values: E(Y|X)
mean sd 50% 2.5% 97.5%
2 0.644 0.008 0.644 0.628 0.661
Expected Values: E(Y|X1)
mean sd 50% 2.5% 97.5%
2 0.827 0.012 0.826 0.804 0.851
Predicted Values: Y|X
mean sd 50% 2.5% 97.5%
2 0.644 0.008 0.644 0.628 0.661
Predicted Values: Y|X1
mean sd 50% 2.5% 97.5%
2 0.827 0.012 0.826 0.804 0.851
First Differences: E(Y|X1) - E(Y|X)
mean sd 50% 2.5% 97.5%
2 0.182 0.011 0.182 0.161 0.204
# calculating the effect in percentage terms (compared to the range of DV)
0.182 / diff(range(exercise.subset$undp_hdi))
[1] 0.2664715
# the effect of increasing from $25000 to $35000
summary(sim.25000.to.35000)
Model: ls
Number of simulations: 1000
Values of X
(Intercept) wdi_gdpc I(wdi_gdpc^2)
2 1 25000 6.25e+08
attr(,"assign")
[1] 0 1 2
Values of X1
(Intercept) wdi_gdpc I(wdi_gdpc^2)
2 1 35000 1.225e+09
attr(,"assign")
[1] 0 1 2
Expected Values: E(Y|X)
mean sd 50% 2.5% 97.5%
2 0.94 0.015 0.94 0.909 0.97
Expected Values: E(Y|X1)
mean sd 50% 2.5% 97.5%
2 0.981 0.018 0.981 0.947 1.014
Predicted Values: Y|X
mean sd 50% 2.5% 97.5%
2 0.94 0.015 0.94 0.909 0.97
Predicted Values: Y|X1
mean sd 50% 2.5% 97.5%
2 0.981 0.018 0.981 0.947 1.014
First Differences: E(Y|X1) - E(Y|X)
mean sd 50% 2.5% 97.5%
2 0.041 0.009 0.042 0.024 0.059
# calculating the effect in percentage (compared to range of DV)
0.041 / diff(range(exercise.subset$undp_hdi))
[1] 0.06002929
We have all we need for statistical and substantial interpretation. Suppose, someone claims that the quality of life depends on how wealthy a society is. What is more, (s)he also claims that added wealth increases quality of life independent of how wealthy a society is.
To check - with the goal of disproving - we look at the relationship between a societies wealth (GDP/captia) and the quality of life in a scatter plot. From that we already got a very clear idea that the relationship is in fact not linear. To be sure, we estimated a regression model including a quadratic term of GDP/captia. We find that the interaction (GDP/captia times GDP/captia) is statistically significant, therefore disproving the claim that the effect of added wealth is independent of initial wealth.
Our friend looks at our results and says that the coefficients are extremely small and that therefore our findings have no substantial meaning in the real world. We know that the effect is larger for poorer societies than for richer societies but how big is it? We estimate the effect of a large leap in GDP/capita of $10,000 for different initial wealth levels.
Our results show that quality of life increases by 0.182 when increasing GDP/capita from 5,000 to 15,000. That is a change of 27% of the entire range of the quality of life measure. The effect is huge. Wealth matters a lot in poorer societies. A move from $5,000 of GDP/captia to $15,000 is no small feat however. Half the countries in our sample have a GDP/capita that is lower than $5,500 and a wealth level of $15,000 means that more than 75% of the countries are poorer.
To show that the effect of wealth levels out we consider what happens when moving from being a rich country to a super rich country (GDP/captia from $25,000 to $35,000). We predict a mean increase in the quality of life index of 0.041 (on a 95% level between 0.024 and 0.059). That still corresponds to an increase of 6% in of the range of quality of life. That is still a very large effect. However, we must admit that an increase of $10,000 in GDP/captia is huge as well.
Note: This way of interpreting the effect relies on comparing the effect for a change of X to the range of Y in the sample.
Estimate a model where wbgi_pse
(political stability) is the response variable and h_j
(1 if country has an independent judiciary) and former_col
are the explanatory variables. Include an interaction between your explanatory variables. What is the marginal effect of:
h_j
and former_col improve model fit?We clear our workspace again. Then we reload the data set and create a subset that only includes the variables that we care about. We then remove missing values from that subset. We estimate the two models and compare the results.
# clear workspace
rm(list = ls())
# load data set
data.set <- read.dta("http://uclspp.github.io/PUBLG100/data/QoG2012.dta")
# subset for analysis
exercise2.subset <- data.set[, c("wbgi_pse", "h_j", "former_col")]
# remove missings from subset
exercise2.subset <- na.omit(exercise2.subset)
# estimate model without interaction
base.model <- zelig(wbgi_pse ~ h_j + former_col, model = "ls",
data = exercise2.subset, cite = FALSE)
# estimate model including the interaction
interaction.model <- zelig(wbgi_pse ~ h_j + former_col + h_j*former_col,
model = "ls", data = exercise2.subset, cite = FALSE)
# comparing the models
screenreg(list(base.model, interaction.model))
========================================
Model 1 Model 2
----------------------------------------
(Intercept) 0.39 *** 0.47 ***
(0.11) (0.11)
h_j 0.17 *** 0.24 ***
(0.03) (0.04)
former_col -0.16 -0.49 *
(0.15) (0.19)
h_j:former_col -0.14 **
(0.05)
----------------------------------------
AIC 432.32 426.69
BIC 444.83 442.34
Log Likelihood -212.16 -208.34
Num. obs. 169 169
R^2 0.28 0.31
Adj. R^2 0.27 0.30
F statistic 32.53 25.09
========================================
*** p < 0.001, ** p < 0.01, * p < 0.05
# F-test
anova(base.model$result, interaction.model$result)
Analysis of Variance Table
Model 1: wbgi_pse ~ h_j + former_col
Model 2: wbgi_pse ~ h_j + former_col + h_j * former_col
Res.Df RSS Df Sum of Sq F Pr(>F)
1 166 121.85
2 165 116.47 1 5.3762 7.6163 0.006438 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Policy experts are interested in the political stability of a country. They conjecture that the existence of an independent judiciary is important, i.e. countries that have one are more stable than those that do not. They also claim that whether a country was colonised or not does not make a difference. A new theory, however, claims that the effect of an independent judiciary depends on whether a country was colonised or not. We test this new theory against the old one.
Model 1 represents the conventional theory. Indeed, we find that countries with an independent judiciary are more stable than those without. And in addition, there is no difference in political stability between countries that have been colonised and those that have not been.
Model 2 represents the more refined theory. The interaction is significant and the F-test is significant as well, indicating that our new model does better at explaining political stability. And the new theory is right, the effect of an independent judiciary is conditional on whether a country was colonised or not.
The model is linear, we get the effect of an independent judiciary when a country was not colonised directly from model 2 in our table. The difference between countries with an independent judiciary and without conditional on not having been colonised is 0.24 on the political stability scale (the effect of the dummy h_j
). The difference that having an independent judiciary makes for countries that have been colonised is smaller. It is 0.10 on the political stability scale (the effect of the dummy h_j
plus the effect of the interaction).
Are those differences large or small?
# summary statistics of the dependent variable
summary(exercise2.subset$wbgi_pse)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-2.4670 -0.8114 -0.1550 -0.1655 0.6775 1.6760
# range of the dependnet variable
diff(range(exercise2.subset$wbgi_pse))
[1] 4.14307
# percentage change in pol. stability of having indep. judiciary if NOT colonised
0.24 / diff(range(exercise2.subset$wbgi_pse))
[1] 0.05792806
# percentage change in pol. stability of having indep. judiciary if colonised
0.10 / diff(range(exercise2.subset$wbgi_pse))
[1] 0.02413669
Those countries that have an independent judiciary and that were not colonised are 6% more stable (compared to the range of the political stability scale as observed in the sample) than those without an independent judiciary. Having an independent judiciary makes a difference of 2% if the country was colonised. Those differences are not very large. However, given that political stability is paramount, having an independent judiciary is a good idea. It makes less of a difference in countries that have been colonised. We should be aware that these theories are quite rudimentary and that we may have omitted causes.
Note: The interpretation, relies on finding the differences between the groups as we only have dummies in the model. The size of the differences again relies on comparing the difference that X makes to the range of Y.
Run a model on the human development index (undp_hdi
), interacting an independent judiciary (h_j
) and control of corruption (wbgi_cce
). What is the effect of control of corruption:
We clear our workspace, reload the data set and create a subset that includes only the variables that we care about. We drop missings from this subset. We estimate the two models. We then compare the regression tables and we do an F-test.
# clear workspace
rm(list = ls())
# load data set
data.set <- read.dta("http://uclspp.github.io/PUBLG100/data/QoG2012.dta")
# subset for analysis
exercise3.subset <- data.set[, c("undp_hdi", "h_j", "wbgi_cce")]
# remove missings
exercise3.subset <- na.omit(exercise3.subset)
# estimate model without interaction
base.model <- zelig(undp_hdi ~ wbgi_cce + h_j, model = "ls", data = exercise3.subset,
cite = FALSE)
# estimate interaction model
interaction.model <- zelig(undp_hdi ~ wbgi_cce + h_j + wbgi_cce:h_j,
model = "ls", data = exercise3.subset, cite = FALSE)
# output table
screenreg(list(base.model, interaction.model))
========================================
Model 1 Model 2
----------------------------------------
(Intercept) 0.72 *** 0.72 ***
(0.02) (0.02)
wbgi_cce 0.11 *** 0.11 ***
(0.01) (0.01)
h_j 0.01 0.01
(0.00) (0.00)
wbgi_cce:h_j 0.00
(0.00)
----------------------------------------
AIC -183.41 -181.61
BIC -171.11 -166.23
Log Likelihood 95.70 95.80
Num. obs. 160 160
R^2 0.48 0.48
Adj. R^2 0.47 0.47
F statistic 71.90 47.75
========================================
*** p < 0.001, ** p < 0.01, * p < 0.05
# F-Test
anova(base.model$result, interaction.model$result)
Analysis of Variance Table
Model 1: undp_hdi ~ wbgi_cce + h_j
Model 2: undp_hdi ~ wbgi_cce + h_j + wbgi_cce:h_j
Res.Df RSS Df Sum of Sq F Pr(>F)
1 157 2.8320
2 156 2.8285 1 0.0035607 0.1964 0.6583
Comparing the two models we see that there is an effect of control of corruption on the quality of life. The better the institutions (more control of corruption), the higher the quality of life. We test the theory that the effect of control of corruption is conditional on having an independent judiciary, i.e. the magnitude of the effect is different in the two sub-groups. Our interaction term is insignificant. The effect of control of corruption seems to be independent of whether an independent judiciary exists or not. Our F-test also clearly rejects that our model with the interaction is doing better at explaining quality of life (as do AIC and BIC).
Normally, we would stop here. There is no point in illustrating a conditional effect that does not exist. However, below we illustrate the effect of control of corruption conditional on having an independent judiciary. The effects overlap and we cannot tell a difference.
Note: The interpretation relies on statistical significance. We do not find support for the conditional effect of X on Y.
# illustration of results
# range of control of corruption
summary(exercise3.subset$wbgi_cce)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-1.70000 -0.83960 -0.32230 -0.02757 0.57480 2.44600
# setting the two scenarios
no.indep.jud.setx <- setx(interaction.model, h_j = 0, wbgi_cce = seq(-1.7, 2.5, 0.1) )
an.indep.jud.setx <- setx(interaction.model, h_j = 1, wbgi_cce = seq(-1.7, 2.5, 0.1) )
# simulation
simulation.result <- sim(interaction.model, x = no.indep.jud.setx, x1 = an.indep.jud.setx)
# plot results
plot(simulation.result,
main = "Effect of Control of Corruption Conditional on Indep. Judiciary",
xlab = "Control of Corruption",
ylab = "Quality of Life")
Clear your workspace and download the California Test Score Data used in Chapters 4-9 by Stock and Watson.
# clear workspace
rm(list = ls())
# load data set from your local folder where you saved the data set
california.schools <- read.dta("caschool.dta")
# check variable names
names(california.schools)
[1] "observat" "dist_cod" "county" "district" "gr_span" "enrl_tot"
[7] "teachers" "calw_pct" "meal_pct" "computer" "testscr" "comp_stu"
[13] "expn_stu" "str" "avginc" "el_pct" "read_scr" "math_scr"
# summary stats of variables of interest
summary(california.schools[, c("testscr", "avginc")])
testscr avginc
Min. :605.5 Min. : 5.335
1st Qu.:640.0 1st Qu.:10.639
Median :654.5 Median :13.728
Mean :654.2 Mean :15.317
3rd Qu.:666.7 3rd Qu.:17.629
Max. :706.8 Max. :55.328
# checking for missings in data set
summary(california.schools)
observat dist_cod county district
Min. : 1.0 Min. :61382 Length:420 Length:420
1st Qu.:105.8 1st Qu.:64308 Class :character Class :character
Median :210.5 Median :67760 Mode :character Mode :character
Mean :210.5 Mean :67473
3rd Qu.:315.2 3rd Qu.:70419
Max. :420.0 Max. :75440
gr_span enrl_tot teachers calw_pct
Length:420 Min. : 81.0 Min. : 4.85 Min. : 0.000
Class :character 1st Qu.: 379.0 1st Qu.: 19.66 1st Qu.: 4.395
Mode :character Median : 950.5 Median : 48.56 Median :10.520
Mean : 2628.8 Mean : 129.07 Mean :13.246
3rd Qu.: 3008.0 3rd Qu.: 146.35 3rd Qu.:18.981
Max. :27176.0 Max. :1429.00 Max. :78.994
meal_pct computer testscr comp_stu
Min. : 0.00 Min. : 0.0 Min. :605.5 Min. :0.00000
1st Qu.: 23.28 1st Qu.: 46.0 1st Qu.:640.0 1st Qu.:0.09377
Median : 41.75 Median : 117.5 Median :654.5 Median :0.12546
Mean : 44.71 Mean : 303.4 Mean :654.2 Mean :0.13593
3rd Qu.: 66.86 3rd Qu.: 375.2 3rd Qu.:666.7 3rd Qu.:0.16447
Max. :100.00 Max. :3324.0 Max. :706.8 Max. :0.42083
expn_stu str avginc el_pct
Min. :3926 Min. :14.00 Min. : 5.335 Min. : 0.000
1st Qu.:4906 1st Qu.:18.58 1st Qu.:10.639 1st Qu.: 1.941
Median :5215 Median :19.72 Median :13.728 Median : 8.778
Mean :5312 Mean :19.64 Mean :15.317 Mean :15.768
3rd Qu.:5601 3rd Qu.:20.87 3rd Qu.:17.629 3rd Qu.:22.970
Max. :7712 Max. :25.80 Max. :55.328 Max. :85.540
read_scr math_scr
Min. :604.5 Min. :605.4
1st Qu.:640.4 1st Qu.:639.4
Median :655.8 Median :652.5
Mean :655.0 Mean :653.3
3rd Qu.:668.7 3rd Qu.:665.9
Max. :704.0 Max. :709.5
# scatter plot
plot( testscr ~ avginc, data = california.schools,
main = "Relationship b/w Income and Test Scores",
xlab = "Average District Income in $1000's",
ylab = "Average Test Score of Reading and Math")
# regression exluding squared income
base.reg <- zelig( testscr ~ avginc, model = "ls", data = california.schools,
cite = FALSE)
# regression including squared income
square.reg <- zelig( testscr ~ avginc + I(avginc^2), model = "ls",
data = california.schools, cite = FALSE)
# regression tables
screenreg(list(base.reg, square.reg))
==========================================
Model 1 Model 2
------------------------------------------
(Intercept) 625.38 *** 607.30 ***
(1.53) (3.05)
avginc 1.88 *** 3.85 ***
(0.09) (0.30)
I(avginc^2) -0.04 ***
(0.01)
------------------------------------------
AIC 3375.07 3333.42
BIC 3387.19 3349.58
Log Likelihood -1684.54 -1662.71
Num. obs. 420 420
R^2 0.51 0.56
Adj. R^2 0.51 0.55
F statistic 430.83 261.28
==========================================
*** p < 0.001, ** p < 0.01, * p < 0.05
# F-test
anova(base.reg$result, square.reg$result)
Analysis of Variance Table
Model 1: testscr ~ avginc
Model 2: testscr ~ avginc + I(avginc^2)
Res.Df RSS Df Sum of Sq F Pr(>F)
1 418 74905
2 417 67510 1 7394.9 45.677 4.713e-11 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
An activist claims that inequality of opportunities determines test scores of pupils. Pupils from rich districts do better than pupils from poor districts. In addition, (s)he claims that one should concentrate on improving conditions in poor districts because there the effect of wealth on test scores is largest.
To test these claims, we must find out whether there is any effect of wealth on test scores at all. Second, whether that effect is non-linear, i.e. the effect of added wealth is larger for poorer districts and then levels off. And third, assuming these effects are statistically significant, we must show whether they are substantial.
To test for the effect of wealth and the non-linearity, we estimate our two models. Average district income is statistically significant. Pupils from richer districts do better. Our quadratic term is significant and has the expected sign. Thus, the effect is stronger for poorer districts and smaller for richer districts. The F-test confirms that our non-linear model does better at explaining test scores.
# descriptive summary statistics of district income
summary(california.schools$avginc)
Min. 1st Qu. Median Mean 3rd Qu. Max.
5.335 10.640 13.730 15.320 17.630 55.330
# what is the income in the 10th percentile?
low1 <- quantile(california.schools$avginc, 0.10)
low1
10%
8.929933
# what is the income in the 15th percentile
low2 <- quantile(california.schools$avginc, 0.15)
low2
15%
9.7702
# how big is difference in terms of money
low2 - low1
15%
0.840267
# illustrating effect of a move from the 10th to the 15th percentile
x.out.low1 <- setx(square.reg, avginc = low1)
x.out.low2 <- setx(square.reg, avginc = low2)
# simulation
s.out.low <- sim(square.reg, x = x.out.low1, x1 = x.out.low2)
# check first difference
summary(s.out.low)
Model: ls
Number of simulations: 1000
Values of X
(Intercept) avginc I(avginc^2)
1 1 8.929933 79.7437
attr(,"assign")
[1] 0 1 2
Values of X1
(Intercept) avginc I(avginc^2)
1 1 9.7702 95.45681
attr(,"assign")
[1] 0 1 2
Expected Values: E(Y|X)
mean sd 50% 2.5% 97.5%
1 638.306 1.009 638.309 636.406 640.402
Expected Values: E(Y|X1)
mean sd 50% 2.5% 97.5%
1 640.874 0.892 640.873 639.149 642.682
Predicted Values: Y|X
mean sd 50% 2.5% 97.5%
1 638.306 1.009 638.309 636.406 640.402
Predicted Values: Y|X1
mean sd 50% 2.5% 97.5%
1 640.874 0.892 640.873 639.149 642.682
First Differences: E(Y|X1) - E(Y|X)
mean sd 50% 2.5% 97.5%
1 2.568 0.163 2.567 2.231 2.878
# dependent variable summary
summary(california.schools$testscr)
Min. 1st Qu. Median Mean 3rd Qu. Max.
605.6 640.0 654.4 654.2 666.7 706.8
# range of the depdendent variable
diff(range(california.schools$testscr))
[1] 101.2
# percent change of DV (percent change based on range of DV as observed in the sample) caused by IV
2.6 / diff(range(california.schools$testscr))
[1] 0.02569171
Following the advice of our activist and the statistically significant results of our model we check for the substantial effect of moving from the 10th percentile in district wealth to the 15th percentile. This corresponds to an increase of $840 average district income. We can show that test scores would increase by 2.6 points on average. That difference is statistically significant, we can say that on a 95% confidence level test scores would increase between 2.3 and 2.9 points. Such an increase in test scores corresponds roughly to a 3% increase in test scores (percent of range of DV as observed in the sample). That difference is quite small and may not warrant state intervention.
We also check for the effect that becoming wealthier by the same amount ($840) would have on a district from the 75th percentile.
# what is the avg. income at the 75th percentile?
high1 <- quantile(california.schools$avginc, 0.75)
high1
75%
17.629
# we add the same amount of wealth as we did for the poor districts
high2 <- high1 + (low2 - low1)
high2
75%
18.46927
# illustrating effect of adding $840 avg. district income to a district at 75th percentile
x.out.high1 <- setx(square.reg, avginc = high1)
x.out.high2 <- setx(square.reg, avginc = high2)
# simulation
s.out.low <- sim(square.reg, x = x.out.high1, x1 = x.out.high2)
# check first difference
summary(s.out.low)
Model: ls
Number of simulations: 1000
Values of X
(Intercept) avginc I(avginc^2)
1 1 17.629 310.7817
attr(,"assign")
[1] 0 1 2
Values of X1
(Intercept) avginc I(avginc^2)
1 1 18.46927 341.1138
attr(,"assign")
[1] 0 1 2
Expected Values: E(Y|X)
mean sd 50% 2.5% 97.5%
1 661.998 0.814 661.99 660.484 663.545
Expected Values: E(Y|X1)
mean sd 50% 2.5% 97.5%
1 663.95 0.867 663.943 662.29 665.595
Predicted Values: Y|X
mean sd 50% 2.5% 97.5%
1 661.998 0.814 661.99 660.484 663.545
Predicted Values: Y|X1
mean sd 50% 2.5% 97.5%
1 663.95 0.867 663.943 662.29 665.595
First Differences: E(Y|X1) - E(Y|X)
mean sd 50% 2.5% 97.5%
1 1.952 0.091 1.958 1.772 2.118
# percent change of DV (percent change based on range of DV as observed in the sample) caused by IV
2 / diff(range(california.schools$testscr))
[1] 0.01976286
When a district at the 75th percentile would become even richer by $840 average income, that would increase test scores by 2 points on average. On a 95% confidence level the increase would be in an interval from 1.8 points to 2.1 points. The increase corresponds to 2% increase of the entire range of observed test scores in our sample. That effect is smaller than in poor districts, and supports the idea that if policy intervention was taken, one should concentrate on poor districts as we would get more bang for a buck there.
Note: Here, the effect of X on Y was presented in the absolute change (point change of test scores) as well as in terms of percentage changes compared to the range of the dependent variable. When the absolute increase of the dependent variable is easy to understand it is good to do that.