Categorical DAta Analysis: Modern approaches
Categorical data analysis is usually a follow-up course of general linear regression modeling (e.g., OLS, generalized least squares) and has its modern origin in the Pearson-Yule debate about the nature of categorical/nominal variables at the turn of last century. It covers most types of data that we regularly analyze, including binary, ordinal, nominal, count, and event time response variables. As such, categorical data analysis has become the workhorse of most empirical research, ranging from banking (e.g., mortgage default) and educational transition (e.g., sequential outcome from primary to graduate school) to clinical visits (e.g., count of visits) and survival time (e.g., time-to-event analysis). This workshop introduces the conceptual background, computational procedures, and statistical techniques for doing categorical data analysis, with a focus on application and interpretation. Selected topics covered include the historical background of categorical data analysis, the basics of likelihood theory, maximum likelihood estimation, M-estimation, null hypothesis significance testing, binary regression, polytomous regression, count regression (optional), survival analysis (optional), and post-estimation analyses. Time permitting, machine learning techniques for binary response variables, such as regularization and tree methods, will be discussed. The first component of each lecture covers important concepts and techniques, and the second one teaches the workflow of doing categorical data analysis using R.
Day 1: From Ordinary Least Squares to Binary Regression
Day 2: Polytomous and Advanced Regression
Day 3: Machine Learning for Categorical Response Data
Dr. Jun Xu is Professor of Sociology and Data Science at Ball State University. He received a Ph.D. in Sociology from Indiana University (Bloomington) and has been teaching methods courses for over 15 years. His methodological interests include Bayesian statistics, categorical data analysis, causal inference, machine learning, etc. He is the author of Modern Applied Regressions: Bayesian and Frequentist Analysis of Categorical and Limited Response Variables with R and Stan (Chapman & Hall/CRC) and a co-author of Ordered Regression Models: Parallel, Partial, and Non-Parallel Alternatives (with Andrew S. Fullerton, Chapman & Hall/CRC). He has published in Sociological Methods and Research, Social Science Research, and The Stata Journal and has authored or co-authored several statistical application packages, including "gencrm", "grcompare", and the very popular "SPost 9.0" package in Stata and "stdcoef" in R.
August 12-14, 2023 (8-10:30 am, US ET). Zoom class with recordings available.