среда, 18 сентября 2019 г.

The SPSS Logistic Regression Output

Using Statistical Regression Methods in Education Research. SPSS will present you with a number of tables of statistics. Let’s work through and interpret them together. Again, you can follow this process using our video demonstration if you like.First of all we get these two tables ( Figure 4.12.1 ): Figure 4.12.1: Case Processing Summary and Variable Encoding for Model. The Case Processing Summary simply tells us about how many cases are included in our analysis The second row tells us that 3423 participants are missing data on some of the variables included in our analysis (they are missing either ethnicity, gender or fiveem , remember we have included all cases with missing SEC), but this still leaves us with 12347 cases to analyse. The Dependent Variable Encoding reminds us how our outcome variable is encoded – ‘0’ for ‘no’ (Not getting 5 or more A*-C grades including Maths and English) and ‘1’ for ‘yes’ (making the grade!). Next up is the Categorical Variables Encoding Table ( Figure 4.12.2 - slightly truncated here). It acts as an important reminder of which categories were coded as the reference (baseline) for each of your categorical explanatory variables. You might be thinking ‘I can remember what I coded as the reference category!’ but it easy to get lost in the output because SPSS has a delightful tendency to rename things just as you are becoming familiar with them… In this case ‘parameter coding’ is used in the SPSS logistic regression output rather than the value labels so you will need to refer to this table later on. Let’s consider the example of ethnicity. White British is the reference category because it does not have a parameter coding. Mixed heritage students will be labelled “ethnic(1)” in the SPSS logistic regression output, Indian students will be labelled “ethnic(2)”, Pakistani students “ethnic(3)” and so on. You will also see that ‘Never worked/long term unemployed’ is the base category for SEC, and that each of the other SEC categories has a ‘parameter coding’ of 1-7 reflecting each of the seven dummy SEC variables that SPSS has created. This is only important in terms of how the output is labelled, nothing else, but you will need to refer to it later to make sense of the output. Figure 4.12.2: Categorical Variables Coding Table. The next set of output is under the heading of Block 0: Beginning Block ( Figure 4.12.3 ): Figure 4.12.3: Classification Table and Variables in the Equation. This set of tables describes the baseline model – that is a model that does not include our explanatory variables! As we mentioned previously, the predictions of this baseline model are made purely on whichever category occurred most often in our dataset. In this example the model always guesses ‘no’ because more participants did not achieve 5 or more A*-C grades than did (6422 compared to 5925 according to our first column). The overall percentage row tells us that this approach to prediction is correct 52.0% of the time – so it is only a little better than tossing a coin! The Variables in the Equation table shows us the coefficient for the constant ( B 0 ). This table is not particularly important but we’ve highlighted the significance level to illustrate a cautionary tale! According to this table the model with just the constant is a statistically significant predictor of the outcome ( p 2 values for the full model. The -2LL value for this model (15529.8) is what was compared to the -2LL for the previous null model in the ‘omnibus test of model coefficients’ which told us there was a significant decrease in the -2LL, i.e. that our new model (with explanatory variables) is significantly better fit than the null model. The R 2 values tell us approximately how much variation in the outcome is explained by the model (like in linear regression analysis). We prefer to use the Nagelkerke’s R 2 (circled) which suggests that the model explains roughly 16% of the variation in the outcome. Notice how the two versions (Cox & Snell and Nagelkerke) do vary! This just goes to show that these R 2 values are approximations and should not be overly emphasized. Moving on, the Hosmer & Lemeshow test ( Figure 4.12.5 ) of the goodness of fit suggests the model is a good fit to the data as p=0.792 ( >.05 ) . However the chi-squared statistic on which it is based is very dependent on sample size so the value cannot be interpreted in isolation from the size of the sample. As it happens, this p value may change when we allow for interactions in our data, but that will be explained in a subsequent model on Page 4.13 . You will notice that the output also includes a contingency table , but we do not study this in any detail so we have not included it here. Figure 4.12.5: Hosmer and Lemeshow Test. More useful is the Classification Table ( Figure 4.12.6 ). This table is the equivalent to that in Block 0 ( Figure 4.12.3 ) but is now based on the model that includes our explanatory variables. As you can see our model is now correctly classifying the outcome for 64.5% of the cases compared to 52.0% in the null model. A marked improvement! Figure 4.12.6: Classification Table for Block 1. However the most important of all output is the Variables in the Equation table ( Figure 4.12.7 ). We need to study this table extremely closely because it is at the heart of answering our questions about the joint association of ethnicity, SEC and gender with exam achievement. Figure 4.12.7: Variables in the Equation Table Block 1. This table provides the regression coefficient ( B ), the Wald statistic (to test the statistical significance) and the all important Odds Ratio ( Exp (B) ) for each variable category. Looking first at the results for SEC, there is a highly significant overall effect ( Wald=1283, df=7, p 2 ), they do not predict the outcome for individual students very well. This is important because it indicates that social class, ethnicity and gender do not determine students’ outcomes (although they are significantly associated with it). There is substantial individual variability that cannot be explained by social class, ethnicity or gender, and we might expect this reflects individual factors like prior attainment, student effort, teaching quality, etc. Let’s move on to discuss interaction terms for now – we will save explaining how to test the assumptions of the model for a little later. Something to look forward to!

Комментариев нет:

Отправить комментарий