Logistic regression neural network.

I assume you're thinking of what used to be, and perhaps still are referred to as 'multilayer perceptrons' in your question about neural networks. If so then I'd explain the whole thing in terms of flexibility about the form of the decision boundary as a function of explanatory variables. In particular, for this audience, I wouldn't mention link functions / log odds etc. Just keep with the idea that the probability of an event is being predicted on the basis of some observations. Here's a possible sequence: Make sure they know what a predicted probability is, conceptually speaking. Show it as a function of one variable in the context of some familiar data. Explain the decision context that will be shared by logistic regression and neural networks. Start with logistic regression. State that it is the linear case but show the linearity of the resulting decision boundary using a heat or contour plot of the output probabilities with two explanatory variables. Note that two classes may not be well-separated by the boundary they see and motivate a more flexible model to make a more curvy boundary. If necessary show some data that would be well distinguished this way. (This is why you start with 2 variables) Note that you could start complicating the original linear model with extra terms, e.g. squares or other transformations, and maybe show the boundaries that these generate. But then discard these, observing that you don't know in advance what the function form ought to be and you'd prefer to learn it from the data. Just as they get enthusiastic about this, note the impossibility of this in complete generality, and suggest that you are happy to assume that it should at least be 'smooth' rather than 'choppy', but otherwise determined by the data. (Assert that they were probably already thinking of only smooth boundaries, in the same way as they'd been speaking prose all their lives). Show the output of a generalized additive model where the output probability is a joint function of the pair of the original variables rather than a true additive combination - this is just for demonstration purposes. Importantly, call it a smoother because that's nice and general and describes things intuitively. Demonstrate the non-linear decision boundary in the picture as before. Note that this (currently anonymous) smoother has a smoothness parameter that controls how smooth it actually is, refer to this in passing as being like a prior belief about smoothness of the function turning the explanatory variables into the predicted probability. Maybe show the consequences of different smoothness settings on the decision boundary. Now introduce the neural net as a diagram. Point out that the second layer is just a logistic regression model, but also point out the non-linear transformation that happens in the hidden units. Remind the audience that this is just another function from input to output that will be non-linear in its decision boundary. Note that it has a lot of parameters and that some of them need to be constrained to make a smooth decision boundary - reintroduce the idea of a number that controls smoothness as the same (conceptually speaking) number that keeps the parameters tied together and away from extreme values. Also note that the more hidden units it has, the more different types of functional forms it can realise. To maintain intuition, talk about hidden units in terms of flexibility and parameter constraint in terms of smoothness (despite the mathematical sloppiness of this characterization) Then surprise them by claiming since you still don't know the functional form so you want to be infinitely flexible by adding an infinite number of hidden units. Let the practical impossibility of this sink in a bit.

Then observe that this limit can be taken in the mathematics, and ask (rhetorically) what such a thing would look like. Answer that it would be a smoother again (a Gaussian process, as it happens; Neal, 1996, but this detail is not important), like the one they saw before. Observe that there is again a quantity that controls smoothness but no other particular parameters (integrated out, for those that care about this sort of thing). Conclude that neural networks are particular, implicitly limited, implementations of ordinary smoothers, which are the non-linear, not necessarily additive extensions of the logistic regression model. Then do it the other way, concluding that logistic regression is equivalent to a neural network model or a smoother with the smoothing parameter set to 'extra extra smooth' i.e. linear. The advantages of this approach is that you don't have to really get into any mathematical detail to give the correct idea. In fact they don't have to understand either logistic regression or neural networks already to understand the similarities and differences. The disadvantage of the approach is that you have to make a lot of pictures, and strongly resist the temptation to drop down into the algebra to explain things. I am going to take the question literally: Someone with no background in statistics. And I'm not going to try to give that person a background in statistics. For instance, suppose you have to explain the difference to the CEO of a company or something like that. So: Logistic regression is a tool for modeling a categorical variable in terms of other variables. It gives you ways to find out how changes in each of the "other" variables affects the odds of different outcomes in the first variable. The output is fairly easy to interpret. Neural networks are a set of methods to let a computer try to learn from examples in ways that vaguely resemble how humans learn about things. It may result in models that are good predictors, but they are usually much more opaque than those from logistic regression. I was taught that you can think of neural networks (with logistic activation functions) as as a weighted average of logit functions, with the weights themselves estimated. By choosing a large number of logits, you can fit any functional form. There's some graphical intuition in the Econometric Sense blog post. The other answers are great. I would simply add some pictures showing that you can think of logistic regression and multi-class logistic regression (a.k.a. maxent, multinomial logistic regression, softmax regression, maximum entropy classifier) as a special architecture of neural networks. A few more illustration for multi-class logistic regression:
Комментариев нет:
Отправить комментарий