Logistic regression normal distribution. On whether an error term exists in logistic regression (and its assumed distribution), I have read in various places that: no error term exists the error term has a binomial distribution (in accordance with the distribution of the response variable) the error term has a logistic distribution. Can someone please clarify? In linear regression observations are assumed to follow a Gaussian distribution with a mean parameter conditional on the predictor values. If you subtract the mean from the observations you get the error : a Gaussian distribution with mean zero, & independent of predictor values—that is errors at any set of predictor values follow the same distribution. In logistic regression observations $y\in\ $ are assumed to follow a Bernoulli distribution † with a mean parameter (a probability) conditional on the predictor values. So for any given predictor values determining a mean $\pi$ there are only two possible errors: $1-\pi$ occurring with probability $\pi$, & $0-\pi$ occurring with probability $1-\pi$. For other predictor values the errors will be $1-\pi'$ occurring with probability $\pi'$, & $0-\pi'$ occurring with probability $1-\pi'$. So there's no common error distribution independent of predictor values, which is why people say "no error term exists" (1). "The error term has a binomial distribution" (2) is just sloppiness—"Gaussian models have Gaussian errors, ergo binomial models have binomial errors". (Or, as @whuber points out, it could be taken to mean "the difference between an observation and its expectation has a binomial distribution translated by the expectation".) "The error term has a logistic distribution" (3) arises from the derivation of logistic regression from the model where you observe whether or not a latent variable with errors following a logistic distribution exceeds some threshold. So it's not the same error defined above. (It would seem an odd thing to say IMO outside that context, or without explicit reference to the latent variable.) † If you have $k$ observations with the same predictor values, giving the same probability $\pi$ for each, then their sum $\sum y$ follows a binomial distribution with probability $\pi$ and no. trials $k$. Considering $\sum y -k\pi$ as the error leads to the same conclusions.
Комментариев нет:
Отправить комментарий