пятница, 9 августа 2019 г.

Interpretation of coefficients in logistic regression output - Cross Validated 1

Logistic regression coefficient interpretation. I am doing logistic regression in R on a binary dependent variable with only one independent variable. I found the odd ratio as 0.99 for an outcomes. This can be shown in following. Odds ratio is defined as, $ratio_ (H) = \frac $. As given earlier $ratio_ (H) = 0.99$ which implies that $P(X=H) = 0.497$ which is close to 50% probability. This implies that the probability for having a H cases or non H cases 50% under the given condition of independent variable. This does not seem realistic from the data as only. 20% are found as H cases. Please give clarifications and proper explanations of this kind of cases in logistic regression. I am hereby adding the results of my model output: I have 1738 total dataset, of which H is a dependent binomial variable. There are 19.95% fall in (H=0) category and remaining are in (H=1) category. Further this binomial dependent variable compare with the covariate X whose minimum value is 82.23, mean value is 223.8 and maximum value is 391.6. The 667 missing values correspond to the covariate X i.e 667 data for X is missing in the dataset out of 1738 data. 20%" you mention? – whuber ♦ Jun 20 '16 at 18:21. The question misinterprets the coefficients. The software output shows that the log odds of the response don't depend appreciably on $X$, because its coefficient is small and not significant ($p=0.138$). Therefore the proportion of positive results in the data, equal to $100 - 19.95\% \approx 80\%$, ought to have a log odds close to the intercept of $1.64$. Indeed, $$\log\left(\frac \right) = \log(4) \approx 1.4$$ is only about one standard error ($0.22$) away from the intercept. Everything looks consistent. Detailed analysis. This generalized linear model supposes that the log odds of the response $H$ being $1$ when the independent variable $X$ has a particular value $x$ is some linear function of $x$, The glm command in R estimated these unknown coefficients with values $$\hat\beta_0 = 1.641666\pm 0.2290133$$ and $$\hat\beta_1 = -0.0014039\pm 0.0009466.$$ The dataset contains a large number $n$ of observations with various values of $x$, written $x_i$ for $i=1, 2, \ldots, n$, which range from $82.3$ to $391.6$ and average $\bar x = 223.8$. Formula $(1)$ enables us to compute the estimated probabilities of each outcome, $\Pr(H=1\,|\,X=x_i)$. If the model is any good, the average of those probabilities ought to be close to the average of the outcomes. Since the odds are, by definition, the ratio of a probability to its complement, we can use simple algebra to find the estimated probabilities in terms of the log odds. As a nonlinear function of $x$, that's difficult to average. However, provided $\beta_1 x$ is small (much less than $1$ in size) and $1+\exp(\hat\beta_0)$ is not small (it exceeds $6$ in this case), we can safely use a linear approximation. Since the $x_i$ never exceed $391.6$, $|\hat\beta_1 x_i|$ never exceeds $391.6\times 0.0014039 \approx 0.55$, so we're ok. Consequently, the average of the outcomes may be approximated as. Although that's not exactly equal to the $19.95\%$ observed in the data, it is more than close enough, because $\hat\beta_1$ has a relatively large standard error. For example, if $\beta_1$ were increased by only $0.3$ of its standard error to $-0.0011271$, then the previous calculation would produce $19.95\%$ exactly .

Комментариев нет:

Отправить комментарий