Introduction to Quantitative Methods

Why Zelig?

Why we do need to run simulation?

When we estimate our statistical model, the output gives us values for our parameters β0...βk . Knowing these we can say what our dependent variable would be given a specific value of an independent variable of interest. For example, with 13 years of education our monthly wage would be £3000. This is a meaningful interpretation that everyone can understand. However, because we do not have an infinite amount of observations, our parameters β0...βk have some estimation uncertainty, represented by the standard errors. Because our parameters are uncertain, so are our predictions. Hence an honest interpretation must quantify that uncertainty. For example, 13 years of education would yield a monthly wage of £3000 plus/minus £500.

Even if our estimation would be extremely precise, so that the standard errors of our coefficients, are effectively zero, there is still a stochastic component to our model. Things that we do not include in our model but that do influence the outcome. For example, the economy could go bust. This fundamental uncertainty prevents us from predicting the dependent variable without error even if we have extremely precise estimates.

Simulation is a technique that lets us make predictions and quantify our fundamental and estimation uncertainty without being masters of math. This is great!!

What is the difference between predicted and expected values in Zelig?

This draws on the distinction between fundamental and estimation uncertainty.

Predicted values are simulations that take the estimation uncertainty and the fundamental uncertainty into account. They are in the same metric as the dependent variable.

Expected values average over the fundamental uncertainty (which zeroes out) and thus only represent the estimation uncertainty.

Thus, predicted values always have a larger variance than the expected values (in practice the point estimate (the average) should be almost similar). The confidence interval of the predicted values can be much larger than the one of expected values.

Now, which one to choose?

If you want to predict the future, e.g. the next election outcome, you would want to include fundamental uncertainty as you are interested in the outcome. Therefore, the predicted value would be appropriate. If you just want to illustrate the effect of an explanatory variable (e.g. the effect of education) the expected value would be an appropriate choice.

For further reading (an excellent explanation of this): King, G., M. Tomz, Wittenberg, J. 2000. “Making the most of Statistical Analyses: Improving Interpretation and Presentation”. American Journal of Political Science. 44 (2): 347-361. http://gking.harvard.edu/files/making.pdf