Logistic regression spss. I have to do binary logistic regression with a lot of independent variables. Most of them are binary, but a few of the categorical variables have more than two levels. What is the best way to deal with such variables? For example, for a variable with three possible values, I suppose that two dummy variables have to be created. Then, in a step-wise regression procedure, it is better to test both of the dummy variables at the same time, or to test them separately? I will use SPSS, but I do not remember it very well, so: how does SPSS deal with this situation? Moreover, for an ordinal categorical variable, it is a good thing to use dummy variables which recreate the ordinal scale? (For example, using three dummy variables for a 4-state ordinal variable, put 0-0-0 for level $1$, 1-0-0 for level $2$, 1-1-0 for level $3$ and 1-1-1 for level $4$, instead of 0-0-0 , 1-0-0 , 0-1-0 and 0-0-1 for the 4 levels.) The UCLA website has a bunch of great tutorials for every procedure broken down by the software type that you're familiar with. Check out Annotated SPSS Output: Logistic Regression -- the SES variable they mention is categorical (and not binary). SPSS will automatically create the indicator variables for you. There's also a page dedicated to Categorical Predictors in Regression with SPSS which has specific information on how to change the default codings and a page specific to Logistic Regression. Logistic regression is a pretty flexible method. It can readily use as independent variables categorical variables. Most software that use Logistic regression should let you use categorical variables. As an example, let's say one of your categorical variable is temperature defined into three categories: cold/mild/hot. As you suggest you could interpret that as three separate dummy variables each with a value of 1 or 0. But, the software should let you use a single categorical variable instead with text value cold/mild/hot. And, the logit regression would derive coefficient (or constant) for each of the three temperature conditions. If one is not significant, the software or the user could readily take it out (after observing t stat and p value). The main benefit of grouping categorical variable categories into a single categorical variable is model efficiency. A single column in your model can handle as many categories as needed for a single categorical variable. If instead, you use a dummy variable for each categories of a categorical variable your model can quickly grow to have numerous columns that are superfluous given the mentioned alternative. As far as my understanding goes, it is good to use dummy variable for categorical/ nominal data while for a ordinal data we can use coding of 1,2,3 for different levels. For dummy variable we will be coding 1 if it is true for a particular onservation and 0 otherwise. Also dummy variables will be 1 less than the no. Of levels, for example in binary we have 1. An all '0' observation in dummy variable will automatically make 1 for the not coded dummy.
Комментариев нет:
Отправить комментарий