Categorical DAta Analysis: Modern approaches


Categorical data analysis is usually a follow-up course of general linear regression modeling (e.g., OLS, generalized least squares) and has its modern origin in the Pearson-Yule debate about the nature of categorical/nominal variables at the turn of last century. It covers most types of data that we regularly analyze, including binary, ordinal, nominal, count, and event time response variables. As such, categorical data analysis has become the workhorse of most empirical research, ranging from banking (e.g., mortgage default) and educational transition (e.g., sequential outcome from primary to graduate school) to clinical visits (e.g., count of visits) and survival time (e.g., time-to-event analysis). This workshop introduces the conceptual background, computational procedures, and statistical techniques for doing categorical data analysis, with a focus on application and interpretation. Selected topics covered include the historical background of categorical data analysis, the basics of likelihood theory, maximum likelihood estimation, M-estimation, null hypothesis significance testing, binary regression, polytomous regression, count regression (optional), survival analysis (optional), and post-estimation analyses. Time permitting, machine learning techniques for binary response variables, such as regularization and tree methods, will be discussed. The first component of each lecture covers important concepts and techniques, and the second one teaches the workflow of doing categorical data analysis using R. 

Agenda (Request syllabus)

Day 1: From Ordinary Least Squares to Binary Regression 

Day 2: Polytomous and Advanced Regression 

Day 3: Machine Learning for Categorical Response Data 


Dr. Jun Xu is Professor of Sociology and Data Science at Ball State University. He received a Ph.D. in Sociology from Indiana University (Bloomington) and has been teaching methods courses for over 15 years. His methodological interests include Bayesian statistics, categorical data analysis, causal inference, machine learning, etc. He is the author of Modern Applied Regressions: Bayesian and Frequentist Analysis of Categorical and Limited Response Variables with R and Stan (Chapman & Hall/CRC) and a co-author of Ordered Regression Models: Parallel, Partial, and Non-Parallel Alternatives (with Andrew S. Fullerton, Chapman & Hall/CRC). He has published in Sociological Methods and Research, Social Science Research, and The Stata Journal and has authored or co-authored several statistical application packages, including "gencrm", "grcompare", and the very popular "SPost 9.0" package in Stata and "stdcoef" in R.

Class Time

On-demand. Lecture and lab materials and recordings are available to view at your convenience.


Dr. Xu is an amazing professor, very student-oriented, helped me learn the content and developed new skills with timely feedback, responded to my emails, and guided me to find answers in the course resources. He is also very understanding and supportive of students' well-being. This course induced curiosity in me for learning more about R programming and data analysis.


This course was very helpful in providing the required level of understanding needed for a data scientist.

This was an incredible course. It is incredible how much knowledge I have gained from this course. I really enjoyed learning interpretations and how to use the various statistical packages.


It is very evident that Dr. Xu is passionate about the topic of this course. He remained incredibly enthusiastic throughout the entire course. Furthermore, he understood that individuals were at different levels of statistical knowledge. As such, he catered to everyone's individual level of knowledge.


Prof. Xu's knowledge base for statistics is outstanding. He deserves to be teaching a class full of research or professionally minded students.