library(texreg)
Warning: package 'texreg' was built under R version 3.2.3
library(Zelig)
library(dplyr)
# Change your working directory
setwd("N:/PUBLG100")
# Check your working directory
getwd()
# clear the environment
rm(list = ls())
Load the immigration dataset (http://uclspp.github.io/PUBLG100/data/communities_immig.csv) and merge it with the communities dataset.
We know that we need data from all three files for comparing different regression models, so let's just merge them all at once.
communities <- read.csv("http://uclspp.github.io/PUBLG100/data/communities.csv")
communities_employment <- read.csv("http://uclspp.github.io/PUBLG100/data/communities_employment.csv")
communities_immigration <- read.csv("http://uclspp.github.io/PUBLG100/data/communities_immig.csv")
communities <- merge(communities, communities_employment, by = c("state", "communityname"))
communities <- merge(communities, communities_immigration, by = c("state", "communityname"))
Rename the PctImmigRec5
variable to RecentImmigration5
.
Use the dplyr select()
function to rename variables.
communities <- select(communities,
state,
Community = communityname,
UnemploymentRate = PctUnemployed,
NoHighSchool = PctNotHSGrad,
White = racePctWhite,
RecentImmigration5 = PctImmigRec5)
Estimate a model explaining the relationship between unemployment rate and recent immigration over the past 5 years using the variable RecentImmigration5
.
Use lm()
to fit a linear model.
model_immigration <- lm(UnemploymentRate ~ RecentImmigration5, data = communities)
Plot the regression line of the model.
Plot the observations and a regression line.
plot(communities$RecentImmigration5, communities$UnemploymentRate,
xlab = "Recent Immigration",
ylab = "Unemployment Rate")
abline(model_immigration, col = "red")
How does this model compare to the models we estimated in the seminar with NoHighSchool
and Minority
as independent variables? Present your findings by comparing the output of all three models in side-by-side tables using the texreg
package.
Fit the two linear models from the seminar and use screenreg()
to display side-by-side tables for comparison.
model_education <- lm(UnemploymentRate ~ NoHighSchool, data = communities)
communities$Minority <- 1 - communities$White
model_minority <- lm(UnemploymentRate ~ Minority, data = communities)
screenreg(list(model_education, model_minority, model_immigration))
=========================================================
Model 1 Model 2 Model 3
---------------------------------------------------------
(Intercept) 0.08 *** 0.26 *** 0.33 ***
(0.01) (0.01) (0.01)
NoHighSchool 0.74 ***
(0.01)
Minority 0.43 ***
(0.02)
RecentImmigration5 0.11 ***
(0.02)
---------------------------------------------------------
R^2 0.55 0.27 0.01
Adj. R^2 0.55 0.27 0.01
Num. obs. 1994 1994 1994
RMSE 0.14 0.17 0.20
=========================================================
*** p < 0.001, ** p < 0.01, * p < 0.05
Save your model comparison table to a Microsoft Word document (or another format if you don't use Word).
Save the output to a document using htmlreg()
htmlreg(list(model_education, model_minority, model_immigration), file = "solutions4_model_comparison.doc")
The table was written to the file 'solutions4_model_comparison.doc'.
Generate predicted values from the fitted model with RecentImmigration5
and plot the confidence interval using Zelig.
This follows the example exactly as we covered in the seminar. We first estimate the model with zelig()
, then generate predicted values with setx()
and sim()
.
z.out <- zelig(UnemploymentRate ~ RecentImmigration5, data = communities, model = "ls")
How to cite this model in Zelig:
Kosuke Imai, Gary King, and Olivia Lau. 2015.
"ls: Least Squares Regression for Continuous Dependent Variables"
in Kosuke Imai, Gary King, and Olivia Lau, "Zelig: Everyone's Statistical Software,"
http://gking.harvard.edu/zelig
x.out <- setx(z.out, RecentImmigration5 = seq(0, 1, 0.1))
s.out <- sim(z.out, x = x.out)
plot(s.out, qi = "pv", xlab = "Recent Immigration (last 5 years)")
Save all the plots as graphics files.
Use png()
to send the plot to a file. When we're done plotting, we must close the file with dev.off()
png(filename = "solutions4_model_immigration.png")
plot(communities$RecentImmigration5, communities$UnemploymentRate,
xlab = "Recent Immigration",
ylab = "Unemployment Rate")
abline(model_immigration, col = "red")
dev.off()
Now let's do the same for the confidence interval plot.
png(filename = "solutions4_conf_interval.png")
plot(s.out, qi = "pv", xlab = "Recent Immigration")
dev.off()