caschool.dta
from your working directory, do the following:library(foreign)
california_dataset <- read.dta("caschool.dta")
Follow the instructions here and read more about it here.
Ideally you want to keep as much data as possible but sometimes removing NAs becomes necessary. If you must drop some observations because of mising values then only do so when the variable(s) you're interested in are missing. Follow the example in Seminar 5 where we drop obervations when latitude
, globalization
and inst_quality
are missing.
Just reload the dataset and start over. There's no other way to get the observations back once you've thrown them away with.
For binary variables, factors make it easy to interpret your results in a regression table. There are plenty of examples in the seminars where we did this. Try it out on your own. Run a regression with a binary variable as independent variable such as male/female or public-school/private-school and look at the results. Then do the same after creating a factor.
NOTE: This ONLY applies to binary variables (i.e. variables with only two levels, 0
or 1
). For categorical variables with more than two levels (such as ethnicity, religion, etc.), you MUST create a factor variable.
There are a number ways to do this but one simple way is to use the summary()
function which gives you the minimum and maximum values of a variable. The summary()
function also gives you other useful statistics such as the mean, 25th and 75th quartiles, and the number of missing values or NAs.
Example: To get the min/max of avginc
, you could do the following:
summary(california_dataset$avginc)
Min. 1st Qu. Median Mean 3rd Qu. Max.
5.335 10.640 13.730 15.320 17.630 55.330
seq()
function do? How do I know what values to put inside seq()
?The seq()
function simply generates a sequence. We often use it to set the range of X values for Zelig's simulation. For example, in the California test score dataset, income is measured in units of $1000's
. To predict the effect of income from 20,000
to 50,000
, you'd use seq(20, 50, 1)
where the last argument 1
is just the size of each increment.
If you're unsure, take a look at what seq()
actually does before using it in other functions like setx()
.
seq(20, 50, 1)
[1] 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
[24] 43 44 45 46 47 48 49 50
If you were dealing with percentages between 0 and 1 then you'd use something like this:
seq(0, 1, 0.1)
[1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Notice how we changed the increment from 1
to 0.1
.
Try different arguments for seq()
to see what it does:
seq(0, 100)
seq(0, 100, 20)
seq(0, 100, by = 20)
seq(0, 100, length.out = 5)
seq(0, 100, length.out = 6)
seq(0, 100, length.out = 8)
Take a look at the explanation here
Take a look at Interpreting Zelig Simulation.
The short answer is NO, you cannot just use SCC all the time without doing the tests first. SCC is NOT a more general form of HAC, they tackle slightly different issues.
Just like with the choice between robust and classical SEs, corrections have their assumptions and if assumptions are violated SE coverage will not have desired properties (e.g. incorrect 95% confidence intervals). If we have autocorrelation and panel heteroskedasticity but no cross-sectional dependency, then SCC standard errors will have incorrect coverage (they’ll be wrong). On the other hand if we have autocorrelation and cross-sectional dependency (correlated heteroskedasticity across panels) then HAC would have incorrect coverage and we should use SCC.
Don't just take a screenshot. In seminar 4 we created a Word compatible file using htmlreg()
function from texreg
package that you can copy/paste to your answers or essay. If you're having problems using this method, try one of the following:
Save your table as a .html
file instead and open it with your browser (Firefox, Chrome, Safari, etc.). Then copy it to a Word document.
htmlreg(list(model1, model2), file="modelcomparison.html")
Create a table by hand or copy/paste the output from the console in RStudio. If you're using copy/paste method, use a fixed-width or monospaced font like "Courier" or Courier New" to ensure that columns align properly.
Again, don't just take a screenshot. At the end of seminar 9 there are instructions for saving a plot to a file.