Logistics services in supply chain: Check Predictive Model Performance

четверг, 25 июля 2019 г.

Check Predictive Model Performance

Check Predictive Model Performance.

Discrimination refers to the ability of the model to distinguish between events and non-events. Area under the ROC curve (AUC / C statistics) It plots true positive rate (aka Sensitivity) and false positive rate (aka 1-Specificity). Mathematically, It is calculated using the formula below - Concordant : Percentage of pairs where the observation with the desired outcome (event) has a higher predicted probability than the observation without the outcome (non-event). Discordant : Percentage of pairs where the observation with the desired outcome (event) has a lower predicted probability than the observation without the outcome (non-event). Tied : Percentage of pairs where the observation with the desired outcome (event) has same predicted probability than the observation without the outcome (non-event). If C>= 0.9, the model is considered to have outstanding discrimination. Caution : The model may be faced with problem of over-fitting. If 0.8 Somer's D = 2 AUC - 1 or Somer's D = (Concordant Percent - Discordant Percent) / 100 It should be greater than 0.4. Kolmogorov-Smirnoff Statistic (KS) It looks at maximum difference between distribution of cumulative events and cumulative non-events. KS statistics should be in top 3 deciles. KS statistics should be between 40 and 70. It is a measure of how close the predicted probabilities are to the actual rate of events. I. Hosmer and Lemeshow Test (HL) It measures the association between actual events and predicted probability. Calculate estimated probability of events Split data into 10 sections based on descending order of probability Calculate number of actual events and non-events in each section Calculate Predicted Probability = 1 by averaging probability in each section Calculate Predicted Probability = 0 by subtracting Predicted Probability=1 from 1 Calculate expected frequency by multiplying number of cases by Predicted Probability = 1 Calculate chi-square statistics taking frequency of observed (actual) and predicted events and non-events. Rule : If p-value > .05. the model fits data well II. Deviance and Residual Test. The null hypothesis states the model fits the data well. In other words, null hypothesis is that the fitted model is correct. Since p-value is greater than 0.05 for both the tests, we can say the model fits the data well.

In SAS, these tests can be computed by using option scale = none aggregate in PROC LOGISTIC. III. Brier Score. The Brier score is an important measure of calibration i.e. the mean squared difference between the predicted probability and the actual outcome. Lower the Brier score is for a set of predictions, the better the predictions are calibrated. If the predicted probability is 1 and it happens, then the Brier Score is 0, the best score achievable. If the predicted probability is 1 and it does not happen, then the Brier Score is 1, the worst score achievable. If the predicted probability is 0.8 and it happens, then the Brier Score is (0.8-1)^2 =0.04. If the predicted probability is 0.2 and it happens, then the Brier Score is (0.2-1)^2 =0.64. If the predicted probability is 0.5, then the Brier Score is (0.5-1)^2 =0.25, irregardless of whether it happens. By specifying fitstat option in proc logistic, SAS returns Brier score and other fit statistics such as AUC, AIC, BIC etc. proc logistic data=train; model y(event="1") = entry; score data=valid out=valpred fitstat ; run; A complete assessment of model performance should take into consideration both discrimination and calibration. It is believed that discrimination is more important than calibration. SAS Macro : Best Model Selection. SAS Tutorials : 100 Free SAS Tutorials. Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 7 years of experience in data science and predictive modeling. During his tenure, he has worked with global clients in various domains like banking, Telecom, HR and Health Insurance. While I love having friends who agree, I only learn from those who don't.

Logistics services in supply chain

четверг, 25 июля 2019 г.

Check Predictive Model Performance

Комментариев нет:

Отправить комментарий

четверг, 25 июля 2019 г.

Check Predictive Model Performance

Комментариев нет:

Отправить комментарий

четверг, 25 июля 2019 г.