четверг, 26 сентября 2019 г.

What is logistic regression

Logistics for dummies. Still have a question? Ask your own! I really like answering "laymen's terms" questions. Though it takes more time to answer, I think it is worth my time as I sometimes understand concepts more clearly when I am explaining it at a high school level. I'll try to make this article as non-technical as possible by not using any complex equations, which is a challenge for a math junkie such as myself. But rest assured, this won't be a one-liner. You may have heard about logistic regression from a blog, newspaper article etc. You'll only understand what it is when you understand what it can solve. Problem: Let us examine a simple and a very hypothetical prediction problem. You have data from past years about students in your class: say math scores, science scores, history scores and physical education scores of their final board exams. Also, when they come back for school re-union 5 years later, you collected data on whether they were successful or not in life. You have about 20 years worth of data. Now you want to see how the students graduating in this current year are going to be in 5 years from now (we'll keep it simple by only considering whether they are successful or not). I know it is debatable whether high school score can predict whether a person is successful or not, but for now let's assume that in our perfect world these things are related. Now we'll add one more character in our example. Say you know that Sarah scored 94 in History, 82 in Math, . and now you want to predict how successful she will be in 5 years. This type of problem is called a “classification problem” as you classify an object as either belonging in a group (successful) or not. Logistic regression is particularly good at solving these. Side Note: Your data might look something like this: Sarah, Ben and Rock (from the spreadsheet screenshot) will stay with us untill the end of our problem. OK, we know what logistic regression solves. Now I'll explain how it solves: Logistic regression makes predictions using probability (there is substantial debate on understanding exactly what probability means, for our understanding it'll be sufficient if you know this much): 0 = you are absolutely sure that the person is not going to be successful in her life 1 = you are absolutely sure that the person is going to be successful 5 years from now on Any value above 0.5 = you are pretty sure about that person succeeding. Say you predict 0.8, then you are 80% confident that the person will succeed. Likewise, any value below 0.5 you can say with a corresponding degree of confidence that the person will not succeed. How does it make this prediction? By developing a model using training data. You have your scores (independent variable), you also know whether a person succeeds or not (dependent variable). You then somehow [1] come up with predictions and you look at how well your predictions align with your recorded data. Say you predicted 0.9 on Ben, and in the same manner you're pretty close in all your predictions then you have a very developed a pretty good model. On the contrary you could also predict 0.2 on Ben, then your model is way off in predicting whether Ben succeeded or not. We go about looking at various models[2] (of course not randomly) and find out the model which fits very closely with our recorded data. The step by which we arrive at a model is called “model selection”. Then you plug in Sarah's (and also everyone in your current class if you wish) scores into this model and it spits out a number between 0 and 1. By looking at this, if it is greater than 0.5 you say you predict person is successful. If it is less than 0.5 you'll say they might not be successful. Note to readers: If you think things can be simplified even further, leave a comment. If you think you can explain it even more simpler terms, please don't hesitate write another answer I'd love to see diverse answers. Advanced Note (Gobbledy gook, you can safely ignore): Logistic Regression can also solve multi-classification problems like whether it belongs to category A, B, C or D. [1] You could refer the answer below written by Alaka Halder as she does a fine job in explaining the mathematics behind logistic regression. Alaka Halder's answer [2] We try to do something called minimizing the squared error. We take a look at sum of errors for each prediction in each model and try to see if we can go even lower than it.

Комментариев нет:

Отправить комментарий