Logistic regression stata. All datasets are available as plain-text ASCII files, usually in two formats: The copy with extension .dat has a header line with the variable names, and codes categorical variables using character strings. This version is best for users of S-Plus or R and can be read using read.table() . Some files do not have column names; in these cases use header=FALSE . The copy with extension .raw omits the header line and codes all variable using numeric codes. This version is better for users of Stata or other packages that prefer numerical codes. (However, Stata can read the character version if you specify the string width using str .) To download any of these files using your browser I recommend that you right-click and choose 'save as. '. If you left-click what happens next depends on how your browser is configured to handle these file types, and will often require an extra step. The datasets are also available as Stata system files with extension .dta , and can be read directly from net-aware Stata versions 10 or higher via the use command. This is the easiest method for Stata users. You can also right click on the links to save a local copy. R users can read the Stata files using Tom Lumley's read.dta() function in the foreign package. The Program Effort Data. Here are the famous program effort data from Mauldin and Berelson. This extract consist of observations on an index of social setting, an index of family planning effort, and the percent decline in the crude birth rate (CBR) between 1965 and 1975, for 20 countries in Latin America. The data are available as plain text files effort.dat, which has a header line with the variable names, and effort.raw, which omits it; otherwise both files look like the listing above. The data are also available in Stata format as effort.dta. Reference: P.W. Mauldin and B. Berelson (1978). Conditions of fertility decline in developing countries, 1965-75. Studies in Family Planning , 9 :89-147. JSTOR: http://www.jstor.org/stable/1965523. Discrimination in Salaries. These are the salary data used in Weisberg's book, consisting of observations on six variables for 52 tenure-track professors in a small college. The variables are: sx = Sex, coded 1 for female and 0 for male rk = Rank, coded 1 for assistant professor, 2 for associate professor, and 3 for full professor yr = Number of years in current rank dg = Highest degree, coded 1 if doctorate, 0 if masters yd = Number of years since highest degree was earned sl = Academic year salary, in dollars. The file is available in the usual plain text formats as salary.dat using character codes and salary.raw using numeric codes, and in Stata format as salary.dta. Here's an excerpt of the "dat" file: Reference: S. Weisberg (1985). Applied Linear Regression , Second Edition. New York: John Wiley and Sons. Page 194. Births in Philadelphia. These are data based on a 5% sample of all births occurring in Philadelphia in 1990. The sample has 1115 observations (after deleting 32 cases with incomplete information) on five variables: black = Mother is black (1=yes, 0=no), educ = Mother's years of education (0,17), smoke = Whether mother smoked during pregnancy (1=yes, 0=no), gestate = Gestational age in weeks, and grams = Birth weight in grams. The data are available in plain text format in the files phbirths.raw and phbirths.dat, and in Stata format as phbirts.dta. The 'dat' file codes black and smoke using TRUE or FALSE, whereas the 'raw' file uses 1 and 0. Reference: I. T. Elo, G. Rodríguez and H. Lee (2001). Racial and Neighborhood Disparities in Birthweight in Philadelphia. Paper presented at the Annual Meeting of the Population Association of America, Washington, DC 2001. The Contraceptive Use Data (W) Here are the contraceptive use data from page 46 of the lecture notes (and from the Stata handout), showing the distribution of 1607 currently married and fecund women interviewed in the Fiji Fertility Survey, according to age, education, desire for more children and current use of contraception. The data are available in the format shown above as cuse.dat, and also as a Stata system file cusew.dta using numeric codes and labels for all variables. These files represent binomial data with 16 groups. The dataset is also available in a long format simulating individual data and using weights to represent the frequencies. Reference: Little, R. J. A. (1978). Generalized Linear Models for Cross-Classified Data from the WFS. World Fertility Survey Technical Bulletins , Number 5. The Contraceptive Use Data (L) This is the alternative version of the contraceptive use data, showing the distribution of 1607 currently married and fecund women interviewed in the Fiji Fertility Survey, according to age, education, desire for more children and current use of contraception. This version has 32 rows corresponding to all possible covariate and response patterns, and includes a weight indicating the frequency of each combination. The file has 5 columns with numeric codes: age (four groups, 1= © 2017 Germán Rodríguez, Princeton University.
Комментариев нет:
Отправить комментарий