Categorical DAta Analysis: Modern approaches


Categorical data analysis is usually a follow-up course of general linear regression modeling (e.g., OLS, generalized least squares) and has its modern origin in the Pearson-Yule debate about the nature of categorical/nominal variables at the turn of last century. It covers most types of data that we regularly analyze, including binary, ordinal, nominal, count, and event time response variables. As such, categorical data analysis has become the workhorse of most empirical research, ranging from banking (e.g., mortgage default) and educational transition (e.g., sequential outcome from primary to graduate school) to clinical visits (e.g., count of visits) and survival time (e.g., time-to-event analysis). This workshop introduces the conceptual background, computational procedures, and statistical techniques for doing categorical data analysis, with a focus on application and interpretation. Selected topics covered include the historical background of categorical data analysis, the basics of likelihood theory, maximum likelihood estimation, M-estimation, null hypothesis significance testing, binary regression, polytomous regression, count regression (optional), survival analysis (optional), and post-estimation analyses. Time permitting, machine learning techniques for binary response variables, such as regularization and tree methods, will be discussed. The first component of each lecture covers important concepts and techniques, and the second one teaches the workflow of doing Bayesian data analysis using R and Stan.

Agenda (Request syllabus)

Day 1: From Ordinary Least Squares to Binary Regression

Day 2: Polytomous and Advanced Regression

Day 3: Machine Learning for Categorical Response Data


Dr. Jun Xu is Professor of Sociology and Data Science at Ball State University. He received a Ph.D. in Sociology from Indiana University (Bloomington) and has been teaching methods courses for over 15 years. His methodological interests include Bayesian statistics, categorical data analysis, causal inference, machine learning, etc. He is the author of Modern Applied Regressions: Bayesian and Frequentist Analysis of Categorical and Limited Response Variables with R and Stan (Chapman & Hall/CRC) and a co-author of Ordered Regression Models: Parallel, Partial, and Non-Parallel Alternatives (with Andrew S. Fullerton, Chapman & Hall/CRC). He has published in Sociological Methods and Research, Social Science Research, and The Stata Journal and has authored or co-authored several statistical application packages, including "gencrm", "grcompare", and the very popular "SPost 9.0" package in Stata and "stdcoef" in R.

Class Time

TBA, 2023 (8pm-10:30pm, US ET). Zoom class with recordings available.


Regular tuition: $575. Early-bird tuition (register by June 1, 2023): $525. Alumni and student tuition (register by June 1, 2023): $475. To audit the first class for free, please register here. To register for the full course, please click the button below.


Dr. Xu is an amazing professor, very student-oriented, helped me learn the content and developed new skills with timely feedback, responded to my emails, and guided me to find answers in the course resources. He is also very understanding and supportive of students' well-being. This course induced curiosity in me for learning more about R programming and data analysis.

This course was very helpful in providing the required level of understanding needed for a data scientist.