четверг, 5 сентября 2019 г.

Multinomial Logistic Regression, SAS Data Analysis Examples

Multinomial Logistic Regression | SAS Data Analysis Examples. Version info : Code for this page was tested in SAS 9.3. Multinomial logistic regression is for modeling nominal outcome variables, in which the log odds of the outcomes are modeled as a linear combination of the predictor variables. Please Note: The purpose of this page is to show how to use various data analysis commands. It does not cover all aspects of the research process which researchers are expected to do. In. particular, it does not cover data cleaning and checking, verification of assumptions, model. diagnostics and potential follow-up analyses. Examples of multinomial logistic regression. Example 1. People’s occupational choices might be influenced by their parents’ occupations and their own education level. We can study the relationship of one’s occupation choice with education level and father’s occupation. The occupational choices will be the outcome variable which consists of categories of occupations. Example 2. A biologist may be interested in food choices that alligators make. Adult alligators might have difference preference than young ones. The outcome variable here will be the types of food, and the predictor variables might be the length of the alligators and other environmental variables. Example 3. Entering high school students make program choices among general program, vocational program and academic program. Their choice might be modeled using their writing score and their social economic status. Description of the data. For our data analysis example, we will expand the third example using the hsbdemo data set. You can download the data here . The data set contains variables on 200 students. The outcome variable is prog , program type. The predictor variables are social economic status, ses, a three-level categorical variable and writing score , write, a continuous variable. Let’s start with getting some descriptive statistics of the variables of interest. Analysis methods you might consider. Multinomial logistic regression: the focus of this page. Multinomial probit regression: similar to multinomial logistic regression but with independent normal error terms. Multiple-group discriminant function analysis: A multivariate method for multinomial outcome variables Multiple logistic regression analyses, one for each pair of outcomes: One problem with this approach is that each analysis is potentially run on a different sample. The other problem is that without constraining the logistic models, we can end up with the probability of choosing all possible outcome categories greater than 1. Collapsing number of categories to two and then doing a logistic regression: This approach suffers from loss of information and changes the original research questions to very different ones. Ordinal logistic regression: If the outcome variable is truly ordered and if it also satisfies the assumption of proportional odds, then switching to ordinal logistic regression will make the model more parsimonious. Alternative-specific multinomial probit regression: allows different error structures therefore allows to relax the independence of irrelevant alternatives (IIA, see below “Things to Consider”) assumption. This requires that the data structure be choice-specific. Nested logit model: also relaxes the IIA assumption, also requires the data structure be choice-specific. Multinomial logistic regression. Below we use proc logistic to estimate a multinomial logistic regression model. The outcome prog and the predictor ses are both categorical variables and should be indicated as such on the class statement. We can specify the baseline category for prog using (ref = “2”) and the reference group for ses using (ref = “1”). The param=ref option on the class statement tells SAS to use dummy coding rather than effect coding for the variable ses . Note that the levels of prog are defined as: 2=academic (reference group) In the output above, the likelihood ratio chi-square of48.23 with a p-value. Here we see the same parameters as in the output above, but with their unique SAS-given names. We are interested in testing whether SES3_general is equal to SES3_vocational , which we can now do with the test statement. The code preceding the “:” on the test statement is a label identifying the test in the output, and it must conform to SAS variable-naming rules (i.e., 32 characters in length or less, letters, numerals, and underscore). The effect of ses=3 for predicting general versus academic is not different from the effect of ses=3 for predicting vocational versus academic. You can also use predicted probabilities to help you understand the model. You can calculate predicted probabilities using the lsmeans statement and the ilink option. For multinomial data, lsmeans requires glm rather than reference (dummy) coding, even though they are essentially the same, so be sure to respecify the coding on the class statement. However, glm coding only allows the last category to be the reference group ( prog = vocational and ses = 3)and will ignore any other reference group specifications. Below we use lsmeans to calculate the predicted probability of choosing program type academic or general at each level of ses , holding write at its means. The predicted probabilities are in the “Mean” column. Thus, for ses = 3 and write = 52.775, we see that the probability of being the academic program (program type 2) is 0.7009; for the general program (program type 1), the probability is 0.1785. To obtain predicted probabilities for the program type vocational, we can reverse the ordering of the categories using the descending option on the proc logistic statement. This will make academic the reference group for prog and 3 the reference group for ses . Here we see the probability of being in the vocational program when ses = 3 and write = 52.775 is 0.1206, which is what we would have expected since (1 – 0.7009 – 0.1785) = 0.1206, where 0.7009 and 0.1785 are the probabilities of being in the academic and general programs under the same conditions. Things to consider. The Independence of Irrelevant Alternatives (IIA) assumption: Roughly, the IIA assumption means that adding or deleting alternative outcome categories does not affect the odds among the remaining outcomes. Diagnostics and model fit: Unlike logistic regression where there are many statistics for performing model diagnostics, it is not as straightforward to do diagnostics with multinomial logistic regression models. Some model fit statistics are listed in the output. Pseudo-R-Squared: The R-squared offered in the output is basically the change in terms of log-likelihood from the intercept-only model to the current model. It does not convey the same information as the R-square for linear regression, even though it is still “the higher, the better”. Sample size: Multinomial regression uses a maximum likelihood estimation method. Therefore, it requires a large sample size. It also uses multiple equations. Therefore, it requires an even larger sample size than ordinal or binary logistic regression. Complete or quasi-complete separation: Complete separation implies that only one value of a predictor variable is associated with only one value of the response variable. You can tell from the output of the regression coefficients that something is wrong. You can then do a two-way tabulation of the outcome variable with the problematic variable to confirm this and then rerun the model without the problematic variable. Empty cells or small cells: You should check for empty or small cells by doing a crosstab between categorical predictors and the outcome variable. If a cell has very few cases (a small cell), the model may become unstable or it might not run at all. Sometimes observations are clustered into groups (e.g., people within families, students within classrooms). In such cases, you may want to see our page on non-independence within clusters. References. Hosmer, D. and Lemeshow, S. (2000) Applied Logistic Regression (Second Edition). New York: John Wiley & Sons, Inc.. Agresti, A. (1996) An Introduction to Categorical Data Analysis. New York: John Wiley & Sons, Inc. Primary Sidebar. Click here to report an error on this page or leave a comment.

Комментариев нет:

Отправить комментарий