пятница, 16 августа 2019 г.

Logistic Regression- Machine Learning Tutorial

Machine Learning Tutorial: Logistic Regression. Learn how you can build Data Science Projects. Logistic Regression. If you are hired as a statistical consultant and asked to quantify the relationship between advertising budgets and sales of a particular product that’s normal regression problem as the dependent variable sales is continuous in nature, however there are many research and educational topics /areas where the dependent variable will be categorical in nature like whether the customer will convert or not whether the patient is prone to cancer or not for that matter any event occurred or not. In that setting the dependent variable is discrete either takes “Yes” or “No” ; “High”,”Medium”,”Low” kind of values. Handling this particular kind of data requires different technique than regression as mentioned above and it’s called classification as we just classify the outcomes of the dependent into classes predefined (“Yes” “No”; ”Cancer” “Not Cancer”) however methods used to classify them predict the probability of that particular event belongs to a particular category usually between 0.0 to 1.0. Based on few cutoffs (0.5) we finally assign/classify the outcome to be a member of a modeled group. Like any statistical techniques logistic regression also has few assumptions to be followed:- Dependent variable to be categorical in nature Independent variables can take continuous or categorical values by nature , where the categorical variables needs to be dummy coded depending on the software Based on the guidelines created cases/values per independent variable should be at least 10 Preferred ratios can be 20 or 50 sometimes based on the kind of computation technique we use to solve or converge a logistic equation Unlike linear discriminant analysis logistic regression does not make any assumptions of normality, linearity, and homogeneity of variance for the independent variables. Instead, it assumes that the binomial distribution describes the distribution of the errors that equal the actual Y minus the predicted Y , this can be taken to be robust as long as the sample set we considered is random. Drawbacks of fitting linear equation to this kind of data. Before we decide or conclude on how to define a logistic equation we need to understand or introspect the current linear equation to understand what are the drawbacks, accordingly we can come up with an equation which fits this particular setting. Assuming reader understands the usual notation of a statistical learning where Y is represented as dependent variable and X1,X2 etc., are considered independent variables used to predict the dependent. If we consider a case where Y takes values 0 and 1 where 0 represents not converted and 1 represents converted and apply normal least square approximation or linear equation which is. Y=a + B X + e. Here we observe a problem with this approach in terms of predicting the class as we can observe for large values of X1 and X2 we see predictions varying above 1 and for small or zero values it gives negative values , which are don’t make sense as we have only two levels defined for our Y 0 and 1. If we try to fit any straight line to this kind of dichotomous dependent variable we face this similar problem, to avoid this we need to come up with a function which always returns either 0 or 1 for whatever values of the independent variables.

Комментариев нет:

Отправить комментарий