Logistic question and answer. Still have a question? Ask your own! What are the conditions for Omitted Variable Bias and how does it affect the coefficient estimates? Why? What are some fixes for OVB? How do you interpret the coefficients in a log-log model? Why? What does the Gauss-Markov Theorem say and why is it important? How does heteroscedasticity affect the coefficient estimates and why? What are some fixes for heteroscedasticity? Given a regression setting with a binary response variable, what probability model should be used and why? What happens to the errors in the logistic regression function? Design a regression model to test the Law of Demand. Explain the Central Limit Theorem to a five year-old. (This one gave me the most trouble). Why do the residuals from a linear regression add up to 0? Is this still true if you fit a regression without intercept? What's so bad about collinearity? We first regress Y [math]Y[/math] on X1 [math]X1[/math] and X2 [math]X2[/math] , then regress Y [math]Y[/math] on X1 [math]X1[/math] and Z [math]Z[/math] , where Z=X1−X2 [math]Z=X1−X2[/math] . How are the coefficients in the two regressions related? We regress Y [math]Y[/math] on categorical data X1,…,Xp [math]X1,…,Xp[/math] . This is a large data set (that's right, big data!) but many of the rows of the design matrix are duplicated. We can summarize the data by averaging Y [math]Y[/math] for each unique row of the design matrix, and perform a weighted regression with each row of the design matrix weighted by its number of duplicates. This gives us a much smaller data set to work with. How do the regression coefficients and their standard errors compare to regression on the raw data? "modern" linear regression; how are Ridge Regression and the LASSO different from ordinary least-squares, and why do you want to use it be comfortable in writing down the closed-form solution ("normal equations") but also how to minimize the least-squares cost function with gradient descent (very useful in practice if you are working with very large datasets) how to evaluate the assumptions (QQ-plot, residual plot) how do you evaluate the performance of your model? Techniques for training (k-fold cross-validation, nested cross-validation) and metrics (MSE, coefficient of determination) how to deal with outliers (e.g., RANSAC) polynomial regression and feature transformation (for non-linear problems); alternatively: decision trees and random forests (advantages and shortcomings), support vector regression What is regression analysis? What do coefficient estimates mean? How do you measure fit of the model? What do R and D mean? What is Ordinary Least Squares? What are some possible problems with regression models? How do you avoid or compensate for them? Name a few types of regression you are familiar with? What are the differences? What is overfitting a regression model? What are ways to avoid it? In linear regression, under what condition R^2 always equals a perfect 1? How do you perform a regression? Why do you perform a regression? What are the con's of performing a regression? How many variables should you use? What are the downfalls of using too many or too few variables. Here are some of the top questions I PERSONALLY ask to test the understanding of the candidates - What is R squared and how is it interpreted? What is multi-collinearity and how do you treat it? (these are the points I expect. I list them down because I rarely get a comprehensive answer for this one) Consult with Domain expert to see what are the imp. variables use VIF to eliminate variables Use a regularization method like L2 (Lasso) to ditch unimportant variables What is the “curse of dimensionality” and how do you tackle it? How do you handle categorical features in your dataset? Difference between R squared and adjusted R squared Tell me what you know about precision, recall and F1 score Why is the loss function different in Linear Regression and Logistic Regression? Regularization - L1, L2, Elastic-Net What are the missing value imputation techniques? What are outliers? How do you detect and treat them? For a more comprehensive understanding of Regression, I recommend this Udemy course . This has in-depth explanations to most of the regression interview questions and concerns in the industry level practice. Hope that helps. Cheers! promoted by Commonlounge. Adding to the bunch: "modern" linear regression; how are Ridge Regression and the LASSO different from ordinary least-squares, and why do you want to use it be comfortable in writing down the closed-form solution ("normal equations") but also how to minimize the least-squares cost function with gradient descent (very useful in practice if you are working with very large datasets) how to evaluate the assumptions (QQ-plot, residual plot) how do you evaluate the performance of your model? Techniques for training (k-fold cross-validation, nested cross-validation) and metrics (MSE, coefficient of determination) how to deal with outliers (e.g., RANSAC) polynomial regression and feature transformation (for non-linear problems); alternatively: decision trees and random forests (advantages and shortcomings), support vector regression EDIT: Bonus point: Prepare a quote from F. Galton. Regression towards mediocrity in hereditary stature. Journal of the Anthropological Institute of Great Britain and Ireland, pages 246–263, 1886.
Комментариев нет:
Отправить комментарий