Estimating Logistic Regression Models in Stata.
Note: For a fuller treatment, download our online seminar Maximum Likelihood Estimation for Categorical Dependent Variables. Say you are interested in predicting whether somebody is a fan of Justin Bieber according to the amount of beer they have consumed as well as their gender. Bieber fever is coded 1 if the respondent is a fan and zero otherwise. Because the dependent variable is dichotomous, the appropriate method is logistic regression. The logistic regression (or logit) model is linear in the log odds of the dependent variable. Most people don’t think in terms of log odds, so it’s common to interpret the results either by exponentiating coefficients to yield odds ratios, or else by computing predicted probabilities. An odds ratio greater than one means that an increase in X leads to an increase in the odds that the dependent variable equals one; an odds ratio less than one means that the odds are decreasing. The predicted probabilities can be calculated using the formula for the cdf to the standard logistic distribution: In Stata, there are two commands for fitting a logistic regression model. (Actually, there are more, but we won’t discuss the .glm and .ml commands here.) The two commands are .logit and .logistc . They estimate exactly the same model, but they report different output. The .logit command reports the untransformed beta coefficients. The .logistic command reports odds ratios, equal to e^β. The syntax for the logit command is the following: . logit bieber beer gender. This produces the following output: Because the coefficient is positive, each additional beer increases the log odds of having Bieber fever by 1.885. In addition, because gender is coded such that males = 1 and females = 0, the log odds of having Bieber fever is higher for males. These coefficients are the untransformed betas from the linear model of the log odds. It is possible to return the predicted probability of, say, a male that has consumed 4 beers as follows: That is, the probability that a male having consumed four beers has Bieber fever is .307. It is possible to recover predicted probabilities for each person in the sample using the .predict command following model estimation. This command has several options, but the default is to calculate predicted probabilities. This creates a new variable, p, containing the predicted probabilities. To get odds ratios, use the .logistic command. According to these results, each additional beer leads to a more than 6-fold increase in the odds of having Bieber fever. Because the gender variable is coded such that males = 1 and females = 0, the odds of having Bieber fever is substantially higher for males. It is also possible to get odds ratios with the .logit command by adding the or option: Likewise, it is possible to recover the untransformed coefficients by adding the coef option to the .logistic command. In either case, following up model estimation with the predict command yields predicted probabilities.
Комментариев нет:
Отправить комментарий