пятница, 16 августа 2019 г.

Logistic Regression from scratch in Python – Martín Pellarolo – Medium

Logistic Regression from scratch in Python. While Python’s scikit-learn library provides the easy-to-use and efficient LogisticRegression class, the objective of this post is to create an own implementation using NumPy. Implementing basic models is a great idea to improve your comprehension about how they work. We will use the well known Iris data set. It contains 3 classes of 50 instances each, where each class refers to a type of iris plant. To simplify things, we take just the first two feature columns. Also, the two non-linearly separable classes are labeled with the same category, ending up with a binary classification problem. Given a set of inputs X, we want to assign them to one of two possible categories (0 or 1). Logistic regression models the probability that each input belongs to a particular category. Hypothesis. A function takes inputs and returns outputs. To generate probabilities, logistic regression uses a function that gives outputs between 0 and 1 for all values of X. There are many functions that meet this description, but the used in this case is the logistic function . From here we will refer to it as sigmoid. Loss function. Functions have parameters/weights (represented by theta in our notation) and we want to find the best values for them. To start we pick random values and we need a way to measure how well the algorithm performs using those random weights. That measure is computed using the loss function, defined as: Gradient descent. Our goal is to minimize the loss function and the way we have to achive it is by increasing/decreasing the weights, i.e. fitting them. The question is, how do we know what parameters should be biggers and what parameters should be smallers? The answer is given by the derivative of the loss function with respect to each weight. It tells us how loss would change if we modified the parameters. Then we update the weights by substracting to them the derivative times the learning rate. We should repeat this steps several times until we reach the optimal solution. Predictions. By calling the sigmoid function we get the probability that some input x belongs to class 1. Let’s take all probabilities ≥ 0.5 = class 1 and all probabilities. Picking a learning rate = 0.1 and number of iterations = 300000 the algorithm classified all instances successfully. 13.8 seconds were needed. These are the resulting weights: LogisticRegression from sklearn: If we trained our implementation with smaller learning rate and more iterations we would find approximately equal weights. But the more remarkably difference is about training time, sklearn is order of magnitude faster. Anyway, is not the intention to put this code on production, this is just a toy exercice with teaching objectives. Further steps could be the addition of l2 regularization and multiclass classification.

Комментариев нет:

Отправить комментарий