Machine Learning Part 3 : Logistic Regression. Image Credit : toshistats.net. Recap : W riting this series of Machine Learning has really been great learning experience for me so far , in my last article Machine Learning: What & Why Part 2 we covered the types of ML Algorithms Viz. Supervised & Unsupervised and further went deeper to understand the supervised learning algorithm, in continuation to the same today we will further decode how Binary Classification(Classifications In Supervised Learning) algos work and can be put to practical implementation to make some amazing predictions. Supervised ML(Revisited) : As covered in my previous article: In supervised learning we already know what we are trying to infer, answer scenarios we are searching are known(found in past or completed data) the intent here is to deduce a model(Mechanism)where our machine can find the answers which are not known based on known set of data inputs and known outcomes, this data sets goes into machine, with inputs and outputs (the answers), and let it learn from the relationships between them. Supervised Learning Categorisation : Binary Classification Problems in Supervised Learning : In our ML Part 2 we discussed the basics of Binary classification and its types , today we will continue from where we left. Types of Binary classification : Logistic Regression Support Vector Machines Decision Trees Neural Networks Logistic Regression : Logistic regression is a statistical method for analysing a dataset in which there are one or more independent variables that determine an outcome. The outcome is measured with a dichotomous variable (in which there are only two possible outcomes). It is used to predict a binary outcome (1 / 0, Yes / No, True / False) given a set of independent variables. To represent binary / categorical outcome, we use dummy variables. You can also think of logistic regression as a special case of linear regression when the outcome variable is categorical, where we are using log of odds as dependent variable. In simple words, it predicts the probability of occurrence of an event by fitting data to a logit function. Logistic regression was developed by statistician David Cox in 1958. This binary logistic model is used to estimate the probability of a binary response based on one or more predictor (or independent) variables (features). It allows one to say that the presence of a risk factor increases the probability of a given outcome by a specific percentage. Like all regression analyses, the logistic regression is a predictive analysis. Logistic regression is used to describe data and to explain the relationship between one dependent binary variable and one or more nominal, ordinal, interval or ratio-level independent variables. Application of Logistic Regression : It’s being used in Healthcare , Social Sciences & various ML for advanced research & analytics. Example : TRISS : Trauma & Injury Severity Score , which is widely used to predict mortality in injured patients, was originally developed by Boyd et al. using logistic regression. Many other medical scales used to assess severity of a patient have been developed using logistic regression. Type of questions that a logistics regression can examine. How does the probability of getting lung cancer (yes vs. no) change for every additional pound of overweight and for every pack of cigarettes smoked per day? Do body weight calorie intake, fat intake, and participant age have an influence on heart attacks (yes vs. no)? The major assumptions are: That the outcome must be discrete, otherwise explained as, the dependent variable should be dichotomous in nature (e.g., presence vs. absent); Binary logistic regression is estimated using Maximum Likelihood Estimation (MLE) , unlike linear regression which uses the Ordinary Least Squares (OLS) approach. MLE is an iterative procedure, meaning that it starts with a guess as to the best weight for each predictor variable (that is, each coefficient in the model) and then adjusts these coefficients repeatedly until there is no additional improvement in the ability to predict the value of the outcome variable (either 0 or 1) for each case. Logistic Regression Function: Most often, we would want to predict our outcomes as YES/NO (1/0). Is your favorite football team going to win the match today? — yes/no (0/1) Does a student pass in exam? — yes/no (0/1) The logistic function is given by: L – Curve’s maximum value. k – Steepness of the curve. x0 – x value of Sigmoid’s midpoint. A standard logistic function is called sigmoid function (k=1,x0=0,L=1) The sigmoid curve : The sigmoid function gives an ‘S’ shaped curve. This curve has a finite limit of: ‘0’ as x approaches −∞ ‘1’ as x approaches +∞ The output of sigmoid function when x=0 is 0.5. Thus, if the output is more tan 0.5 , we can classify the outcome as 1 (or YES) and if it is less than 0.5 , we can classify it as 0(or NO) . For example: If the output is 0.65, we can say in terms of probability as: “There is a 65 percent chance that your favorite foot ball team is going to win today ” . Thus the output of the sigmoid function can not be just used to classify YES/NO, it can also be used to determine the probability of YES/NO. Some More Example for Logistic Regression : Historically , the first application of logistic regression was "given a certain dose of poison, how likely is this pest to die?" Other applications that are pretty common: Life insurance actuaries use logistic regression to predict, based on given data on a policy holder (e.g. age, gender, results from a physical examination) the chances that the policy holder will die before the term of the policy expires. Political campaigns try to predict the chances that a voter will vote for their candidate (or do something else desirable, such as donate to the campaign). Logistics Regression technical Implementation Using Python & R : In the next part of the Machine Learning we will learn how logistics regression works programmatically using Python & R language.
Комментариев нет:
Отправить комментарий