Logistic regression normal distribution. I'm trying to understand how logistic regression uses the binomial distribution. Let's say I'm studying nest success in birds. The probability of a nest being successful is 0.6. Using the binomial distribution, I can calculate the probability of r successes given n trials (numbers of nests studied). But how is the binomial distribution used in a modelling context? Let's say I want to know how mean daily temperature affects nest success and I use logistic regression to explore this question. Within the context I've described, how does the logistic regression used the binomial distribution? I'm looking for an intuitive answer, therefore an answer without equations! I think equations are only useful once understanding has been achieved at an intuitive level. Suppose you observe several nests at different mean daily temperatures $t$. How does the probability $\pi(t)$ of nest success depend on the temperature $t$? (If nests are independent, the number of nests with success at temperature $t$ is then binomially distributed with $n$ equal to the number of nests observed and success probability $\pi(t)$.) Logistic regression is one approach (using the logistic function) of specifying the success probability as a function of temperature via stretching and shifting the logistic curve, with the amount of stretching and shifting required to be estimated from the data. Without equations? Yikes. Let's see: The logistic regression model is literally a model for the $p$ parameter of a binomial distribution; with a continuous predictor, each point can have its own distribution. (In the cases where the observations are 0-1, we deal with the Bernoulli special case; this is a common situation.) The $n$ is given, not modelled. So the result is, with a model relating the $p_i$'s and a known $n_i$, we can model binomial data in terms of a predictor that describes the mean (and variance) via its model for $p$. The model may be fit via maximum likelihood estimation, but because of its special form (exponential family), ML is relatively "nice". Because the logistic link is canonical for the binomial family, it's even nicer, since the sufficient statistics are of very simple form - this makes it convenient for dealing with large samples, or even to develop 'online' algorithms. Of course, $p$, being a probability, lies between 0 and 1. This, naturally, means that when we write a model for it in terms of some other variable, that model can't crash through those limits, so as the independent variable gets sufficiently large or small, the relationship must bend to stay inside the bounds. With logistic regression, that curve (the link function) is a logistic function. Other functions are possible, and many packages implement several (R has three suitable ones built into its glm functionality if I recall right). No equality symbols were harmed in the making of this post.
Комментариев нет:
Отправить комментарий