Python for Data Science. What’s in this section: Introduction to Logistic Regression. Logistic regression models are used to analyze the relationship between a dependent variable (DV) and independent variable(s) (IV) when the DV is dichotomous. The DV is the outcome variable, a.k.a. the predicted variable, and the IV(s) are the variables that are believed to have an influence on the outcome, a.k.a. predictor variables. If the model contains 1 IV, then it is a simple logistic regression model, and if the model contains 2+ IVs, then it is a multiple logistic regression model. Assumptions for logistic regression models: The DV is categorical (binary) If there are more than 2 categories in terms of types of outcome, a multinomial logistic regression should be used Independence of observations Cannot be a repeated measures design, i.e. collecting outcomes at two different time points. Independent variables are linearly related to the log odds Absence of multicollinearity Lack of outliers. Data Used in this example. Data used in this example is the data set that is used in UCLA’s Logistic Regression for Stata example. The question being asked is, how does GRE score, GPA, and prestige of the undergraduate institution effect admission into graduate school. The DV is admission status (binary), and the IVs are: GRE score, GPA, and undergraduate prestige. Let’s import pandas as pd, load the data set, and take a look at the variables! The data can be loaded either with the code below, or from our GitHub. Loading the data from both sources are shown below.
Комментариев нет:
Отправить комментарий