Causal Machine Learning
Lecturer: Professor Martin Huber, PhD (University of Fribourg)
Date: April 10-11, 2025
Venue: Halle Institute for Economic Research (IWH) – Member of the Leibniz Association, Kleine Maerkerstrasse 8, 06108 Halle (Saale), Germany, conference room (ground floor).
Registration: Please register for the course until March 15, 2025 by sending an e-mail to cgde@iwh-halle.de.
Announcement: pdf
The course is designed for at most 20 participants.
COURSE DESCRIPTION
This course provides an introduction to causal machine learning with applications using the software “R”. Causal machine learning aims at assessing the causal effect of some intervention or treatment, like a medical treatment or a training program, on an outcome of interest, like health or wage. The assessment of a causal effect requires that groups receiving and not receiving the treatment are comparable in background characteristics which also affect their outcome (e.g. pre-treatment health, education etc.). Causal machine learning can be used to generate such comparable groups in a data-driven way by estimating two separate models for how the characteristics affect the treatment and the outcome. Such approaches also permit detecting subgroups for whom the treatment effect is particularly large as a function of their observed characteristics (effect heterogeneity analysis). This is useful for optimally targeting specific subgroups by the treatment (optimal policy learning). Finally, by repeatedly assigning alternative treatments over time in an appropriate way, one may learn and converge to the assignment of the most effective treatment (reinforcement learning). This course discusses the underlying assumptions, intuition, and usefulness of machine learning for causal analysis. It also introduces various causal machine learning algorithms, like double lasso regression, causal random forests, double machine learning, and optimal policy trees. Using the statistical software “R” and its interface “R Studio”, these methods are applied to various real-world data sets.
CONTENT
– Brief rehearsal of key concepts of causal inference (potential outcome notation and different approaches to the identification of causal effects)
– Causal analysis based on penalized regression (lasso and ridge regression)
– Causal analysis using tree-based approaches (causal trees and causal forests)
– Causal analysis based on double machine learning
– Assessing effect heterogeneity across subgroups
– Optimal policy learning to maximize treatment effectiveness using tree-based approaches
– Reinforcement learning to learn the most effective treatment (among several alternatives) by repeated treatment assignment over time
– Application of methods to real world data using the statistical software “R” and its interface “R Studio”
OBJECTIVES
– To understand the ideas and goals of machine learning for causal analysis
– To understand the intuition, advantages, and disadvantages of alternative methods -Causal machine learning – page 1 of 2
– To be able to apply causal machine learning to real world data using the software “R” and its interface “R Studio”
PREREQUISITES
Introductory statistics (probability theory, conditional means, linear regression), basic command of the statistical software “R” is desirable, but not strictly required.
DAILY SCHEDULE
08:30−10:00 First lecture
10:00−10:30 Coffee break
10:30−12:00 Second lecture
12:00−13:00 Lunch
13:30−15:00 Third lecture
15:00−15:30 Coffee break
15:30−17:00 Fourth lecture
MATERIAL
Lecture slides, R code, and data files will be made available to the course participants.
TEXTBOOK
M. Huber (2023): Causal analysis – Impact evaluation and causal machine learning with applications in R, MIT Press, Cambridge.
Free online version available at: https://mitpress.ublish.com/ebook/causal-analysis-impact-evaluation-and-causal-machine-learning-with-applications-in-r-preview/12759/162