6.S951: Modern Mathematical Statistics, fall 2024

Overview

Mathematical statistics provides a formal language for reasoning about data and uncertainty. This course introduces the basic framework of statistical decision theory and learning theory. We focus on two themes. First, we will develop results for statistical inference  —  procedures that have mathematical guarantees about the degree of certainty in the output. Such results involve both mathematical derivations as well as conceptual considerations about what of outputs and guarantees are practically informative. We discuss causality, multiple hypothesis testing, nonparametric and semiparametric statistics, and results for model misspecification.

The second theme is statistical optimality  —  what are the fundamental limits of certainty and precision given a limited amount of data, and which procedures achieve this? Here, we discuss concepts such as sufficiency, Bayes and minimax optimality of statistical procedures, with applications to optimal estimation, hypothesis testing, and prediction. In particular, we discuss the asymptotic optimality of maximum likelihood and information-theoretic lower bounds for estimation.

This is a graduate course targeted at students interested in statistical research, with an emphasis on proofs and fundamental understanding. We assume previous exposure to undergraduate statistics, along with a strong undergraduate background in linear algebra, probability, and real analysis. This course together with IDS.160[J] (which is taught in the spring) form a two-part graduate curriculum in statistical theory. Neither course strictly depends on the other.

Time and location

  • T/Th 9:30am - 11:00am

  • 36-156

Outline

  • Intro to statistical inference, the central limit theorem and concentration inequalities

  • Conformal prediction and boostrap (valid inference with ML algorithms)

  • Statistical decision theory and learning theory framework

  • Minimax and Bayes optimality

  • Optimal hypothesis testing and Neyman–Pearson

  • Information-theoretic minimax lower bounds

  • Theory of regression and generalized linear models

  • M-estimation and maximum likelihood

  • Behavior of M-estimators and maximum likelihood with misspecified models

  • Optimality of maximum likelihood

  • Empirical Bayes and the James-Stein estimator

  • Multiple testing and the false discovery rate

  • Semiparametric inference and causality

This course is not about any one set of mathematical techniques, but is instead about understanding foundational statistical behavior. We will study this with exact calculations, asymptotic approximations, and concentration inequalities (finite-sample bounds).

Schedule

The course is split into four units.

  • Optimality I. This unit introduces statistical decision theory and notions of statistical optimality (in particular, Bayes optimality and minimax optimality). We show how the framework can be used to describe estimation, prediction, and hypothesis testing.

  • Inference I This unit discusses statistical inference  —  procedures that output a rigorous notion of confidence/uncertainty. We focus on confidence intervals and related outputs for now, giving algorithms such as bootstrap and conformal prediction that yield confidence intervals. We also discuss the relationship between confidence intervals and statistical testing.

  • Advanced Inference. This unit covers advanced topics in statistical inference: predictive inferenence and conformal prediction, multiple hypothesis testing, and empirical Bayes.

  • Asymptotic Inference and Optimality. This unit uses asymptotic calculations to analyze the behavior of estimators. This enables us to (i) establish (asymptotically) optimal estimators, (ii) give approximate confidence intervals, and (iii) to understand behavior under model misspecification.

Date Unit Topic Resources
Thursday, September 5 Optimality 1 Statistical decision theory and learning theory framework, sufficiency, Rao-Blackwell theorem Wass Ch 12, Ch 9.13, scribe notes
Tuesday, September 10 Optimality 1 Bayes optimality, Minimax optimality Wass Ch 12, scribe notes
Thursday, September 12 Optimality 1 Admissibility, Bayes and minimax in Gaussian linear model Wass Ch 12, Wass Ch 13, scribe notes
Tuesday, September 17 Optimality 1 Linear model cont., behavior under model misspecification Wass Ch 13, scribe notes
Thursday, September 19 Optimality 1 Learning theory examples (statistical decision theory for prediction) scribe notes, slides, See Wass Ch 4 & 5 for supplementary background
Tuesday, September 24 Optimality 1 Optimal hypothesis testing, Neyman–Pearson, TV distance TSH 3.1, 3.2, scribe notes, See Wass Ch 10 for supplementary background
Thursday, September 26 Optimality 1 TV distance, composite testing with MLR TSH 3.4, scribe notes
Tuesday, October 1 Optimality 1 Minimax lower bounds via reduction to testing scribe notes, Le Cam's Method reference
Thursday, October 3 Inference 1 CLT and concentration, confidence intervals for means scribe notes, VdV 2.1, Wass Ch 5. See Wass Ch 4 for background and TSH 11.4.3 for Bahadur-Savage result
Tuesday, October 8 No lecture - -
Thursday, October 11 Inference 1 Inference for CDFs, the DKW inequality scribe notes, slides, Wass Ch 7
Tuesday, October 15 No lecture (student holiday) - -
Thursday, October 17 Inference 1 Bootstrap, inference via simulation scribe notes, slides, Wass Ch 8
Tuesday, October 22 Inference 1 Hypothesis testing, p-values, testing-confidence interval duality scribe notes, Wass 10.1, 10.2
Thursday, October 24 No lecture - -
Tuesday, October 29 - in-class midterm exam -
Thursday, October 31 Inference 1 p-values, permutation tests scribe notes, Wass 10.5
Tuesday, November 5 Inference 2 Predictive inference, conformal prediction scribe notes, Wass 13.4
Thursday, November 7 Inference 2 Conformal prediction scribe notes, conformal book Ch 1 and Ch 3
Tuesday, November 12 Inference 2 Risk-control for predictive inference scribe notes, paper
Thursday, November 14 Inference 2 Multiple testing: Bonferroni and FWER control scribe notes, Wass 10.7
Tuesday, November 19 Inference 2 Multiple testing: Benjamini-Hochberg and FDR control Wass 10.7, scribe notes
Thursday, November 21 Asymptotics Intro, Uniform Tightness, Delta Method VdV 2.1, 2.2, 3.1, scribe notes
Thursday, November 26 Asymptotics Moment estimators, Exponential Families VdV 4, scribe notes
Tuesday, Dec 3 Asymptotics Moment estimators, M-estimators VdV 5.1-5.2, scribe notes
Thursday, Dec 5 Asymptotics M-estimators VdV 5.3, 5.6, scribe notes
Tuesday, Dec 10 Asymptotics M-estimators, optimality of maximum likelihood VdV 5.5, 8.5, 8.6, scribe notes
Wed, Dec 18, 9-noon - Final Exam

Homework

  • Homework 1 is released on Canvas, due Thursday, September 19.

  • Homework 2 is released on Canvas, due Friday, September 27.

  • Homework 3 is released on Canvas, due Friday, October 4.

  • Homework 4 is released on Canvas, due Friday, October 11.

  • Homework 5 is released on Canvas, due Sunday, October 20.

  • Homework 6 is released on Canvas, due Sunday, October 27.

  • Homework 7 is released on Canvas, due Friday, Nov 8.

  • Homework 8 is released on Canvas, due Monday, Nov 18.

  • Homework 9 is released on Canvas, due Wednesday, Nov 25.

  • Homework 10 is released on Canvas, due Wednesday, Dec 11.

Course materials

This course will use course notes and selections from Asymptotic Statistics (“vdV”) as the primary course material. All of Statistics (“Wass”), Theory of Point Estimation (“TPE”), and Testing Statistical Hypotheses (“TSH”) will be used as supplementary texts. All are available online through the MIT library.

Logistics and grading

Assignments are submitted via Gradescope, and we have a Piazza discussion board.

The course will have weekly problem sets, a final exam, and a midterm exam. Enrolled students must submit latexed scribe notes for one lecture.

Final course grades are computed as 35% homework score + 35% max(midterm, final) + 25% min(midterm, final) + 5% scribe notes score.

Your lowest two problem set scores will be dropped. Your third lowest problem score can be dropped if you earned a score of least 50%. The point is that your grade is not sensitive to missing an assignment or having a poor score.

Office hours: see Piazza

Scribe notes

Scribe due 24 hours after the lecture for full credit (50% credit if within 48 hours), and these are graded. Please submit these on Gradescope.