вторник, 30 июля 2019 г.

Difference between Linear Regression and Logistic Regression

Difference between Linear Regression and Logistic Regression.

Difference Between Linear and Logistic Regression Linear Regression vs Logistic Regression
The purpose of this post is to help you understand the difference between linear regression and logistic regression. These regression techniques are two most popular statistical techniques that are generally used practically in various domains. Since these techniques are taught in universities, their usage level is very high in predictive modeling world. In this article, we have listed down 13 differences between these two algorithms. Difference between Linear and Logistic Regression 1. Variable Type : Linear regression requires the dependent variable to be continuous i.e. numeric values (no categories or groups). While Binary logistic regression requires the dependent variable to be binary - two categories only (0/1). Multinominal or ordinary logistic regression can have dependent variable with more than two categories. 2. Algorithm : Linear regression is based on least square estimation which says regression coefficients should be chosen in such a way that it minimizes the sum of the squared distances of each observed response to its fitted value. While logistic regression is based on Maximum Likelihood Estimation which says coefficients should be chosen in such a way that it maximizes the Probability of Y given X (likelihood).
Difference between Linear Regression and Logistic Regression
With ML, the computer uses different "iterations" in which it tries different solutions until it gets the maximum likelihood estimates. Y is target or dependent variable, b0 is intercept. x1,x2,x3. xk are predictors or independent variables. b1,b2,b3. bk is coefficients of respective predictors. Logistic Regression Equation : P(y=1) = e(b0 + b1x1 + b2x2 +-----bkxk) / (1+e(b0+b1x1+ b2x2+------bkxk)) Which further simplifies to : The above function is called logistic or sigmoid function. Changing the coefficient leads to change in both the direction and the steepness of the logistic function. It means positive slopes result in an S-shaped curve and negative slopes result in a Z-shaped curve. 5. Linear Relationship : Linear regression needs a linear relationship between the dependent and independent variables. While logistic regression does not need a linear relationship between the dependent and independent variables. 6. Normality of Residual : Linear regression requires error term should be normally distributed. While logistic regression does not require error term should be normally distributed. 7. Homoscedasticity : Linear regression assumes that residuals are approximately equal for all predicted dependent variable values.
Difference between Linear Regression and Logistic Regression
While Logistic regression does not need residuals to be equal for each level of the predicted dependent variable values. 8. Sample Size : Linear regression requires 5 cases per independent variable in the analysis.While logistic regression needs at least 10 events per independent variable. 9. Purpose : Linear regression is used to estimate the dependent variable incase of a change in independent variables . For example, relationship between number of hours studied and your grades. Whereas logistic regression is used to calculate the probability of an event. For example, an event can be whether customer will attrite or not in next 6 months. 10. Interpretation : Betas or Coefficients of linear regression is interpreted like below - Keeping all other independent variables constant, how much the dependent variable is expected to increase/decrease with an unit increase in the independent variable. The effect of a one unit of change in X in the predicted odds ratio with the other variables in the model held constant. 11. Distribution : Linear regression assumes normal or gaussian distribution of dependent variable. Whereas, Logistic regression assumes binomial distribution of dependent variable. Note : Gaussian is the same as the normal distribution. See the implementation in R below - R Code : Create sample data by running the following script set.seed(123) y = ifelse(runif(100) Linear Regression glm(y1. x, family = gaussian (link = "identity"))

Комментариев нет:

Отправить комментарий