Logistic regression cost function. Cost Function of Logistic regression. Logistic regression finds an estimate which minimizes the inverse logistic cost function. Where [math]h_ (x) [/math] is defined as follows, In order to understand the above cost function in a better way, please see the below diagram. y = actual label (It takes 0 for negative class and 1 is positive class) [math]h_ (x)[/math] = Predicted probabilities by logistic regression. If an actual label of a particular data point ( [math]y(x_ )[/math] ) is zero , then the cost of logistic function will be given by green graph. If an actual label of a particular data point [math](y(x_ )) [/math] is one , then the cost of logistic function will be given by red graph. It shows that, If an actual label of a particular data point [math](y(x_ ))[/math] is zero and the predicted probability of [math]x_ [/math] is one, then the cost of the logistic function will be very high. Similarly, If an actual label of a particular data point [math](y(x_ )) [/math] and the predicted probability of [math]x_ [/math] are same, then the cost of a logistic function will be zero. So, we need to find an estimate ( [math]\hat [/math] ) in such a way that the cost function will have to be minimum. The above graph shows that the logistic cost function is convex cost function, so we don’t need to worry about local minimum. But, it is not possible to find a global minimum point using closed form solution as linear regression ( [math]\hat\beta=(X^TX)^ X^Ty[/math] ) because the sigmoid function is non-linear. Gradient Descent Algorithm. We can use many optimization algorithms like gradient descent, Conjugate gradient, BFGS to find the global minimum point, which is nothing but estimate of [math]\beta[/math] . Among those algorithms, the most popular one is gradient descent algorithm. It doesn’t find an estimate in a single step like regression. It moves the estimate value towards optimum iteratively. We need to choose the suitable step size for gradient algorithm, which decide the number of iteration require for our estimate to converge optimal point. It is important tuning parameter for an algorithm. Too small step size will make the algorithm convergence slower. Similarly, too large step size may skip the optimal point. Some of the advanced optimization algorithm like BFGS finds the optimal step size itself.
Комментариев нет:
Отправить комментарий