Causal Machine Learning

Special Courses

Course description

This course provides an introduction to causal machine learning with applications using the software “R”. Causal machine learning aims at assessing the causal effect of some intervention or treatment, like a medical treatment or a training program, on an outcome of interest, like health or wage. The assessment of a causal effect requires that groups receiving and not receiving the treatment are comparable in background characteristics which also affect their outcome (e.g. pre-treatment health, education etc.). Causal machine learning can be used to generate such comparable groups in a data-driven way by estimating two separate models for how the characteristics affect the treatment and the outcome. Such approaches also permit detecting subgroups for whom the treatment effect is particularly large as a function of their observed characteristics (effect heterogeneity analysis). This is useful for optimally targeting specific subgroups by the treatment (optimal policy learning). Finally, by repeatedly assigning alternative treatments over time in an appropriate way, one may learn and converge to the assignment of the most effective treatment (reinforcement learning). This course discusses the underlying assumptions, intuition, and usefulness of machine learning for causal analysis. It also introduces various causal machine learning algorithms, like double lasso regression, causal random forests, double machine learning, and optimal policy trees. Using the statistical software “R” and its interface “R Studio”, these methods are applied to various real-world data sets.

Content

  • Brief rehearsal of key concepts of causal inference (potential outcome notation and different approaches to the identification of causal effects)
  • Causal analysis based on penalized regression (lasso and ridge regression)
  • Causal analysis using tree-based approaches (causal trees and causal forests)
  • Causal analysis based on double machine learning
  • Assessing effect heterogeneity across subgroups
  • Optimal policy learning to maximize treatment effectiveness using tree-based approaches
  • Reinforcement learning to learn the most effective treatment (among several alternatives) by repeated treatment assignment over time
  • Application of methods to real world data using the statistical software “R” and its interface “R Studio”

Objectives

  • To understand the ideas and goals of machine learning for causal analysis
  • To understand the intuition, advantages, and disadvantages of alternative methods – Causal machine learning – page 1 of 2
  • To be able to apply causal machine learning to real world data using the software “R” and its interface “R Studio”

Prerequisites

Introductory statistics (probability theory, conditional means, linear regression), basic command of the statistical software “R” is desirable, but not strictly required.

Registration

Please register for the course until March 15, 2025 by sending an e-mail to cgde@iwh-halle.de.

The course is designed for at most 20 participants.

Schedule

08:30 – 10:00 First lecture

10:00 – 10:30 Coffee break

10:30 – 12:00 Second lecture

12:00 – 13:00 Lunch

13:30 – 15:00 Third lecture

15:00 – 15:30 Coffee break

15:30 – 17:00 Fourth lecture

Material

Lecture slides, R code, and data files will be made available to the course participants.

Textbook

M. Huber (2023): Causal analysis – Impact evaluation and causal machine learning with applications in R, MIT Press, Cambridge. 

Free online version available at: https://mitpress.ublish.com/ebook/causal-analysis-impact-evaluation-and-causal-machine-learning-with-applications-in-r-preview/12759/162

Course details