6.S951: Modern Mathematical Statistics, fall 2024

Update: this course will be offered again as 6.7730 in fall 2025 and every fall thereafter.

Overview

Mathematical statistics provides a formal language for reasoning about data and uncertainty. This course introduces the basic framework of statistical decision theory and learning theory. We focus on two themes. First, we will develop results for statistical inference — procedures that have mathematical guarantees about the degree of certainty in the output. Such results involve both mathematical derivations as well as conceptual considerations about what of outputs and guarantees are practically informative. We discuss causality, multiple hypothesis testing, nonparametric and semiparametric statistics, and results for model misspecification.

The second theme is statistical optimality — what are the fundamental limits of certainty and precision given a limited amount of data, and which procedures achieve this? Here, we discuss concepts such as sufficiency, Bayes and minimax optimality of statistical procedures, with applications to optimal estimation, hypothesis testing, and prediction. In particular, we discuss the asymptotic optimality of maximum likelihood and information-theoretic lower bounds for estimation.

This is a graduate course targeted at students interested in statistical research, with an emphasis on proofs and fundamental understanding. We assume previous exposure to undergraduate statistics, along with a strong undergraduate background in linear algebra, probability, and real analysis. This course together with IDS.160[J] (which is taught in the spring) form a two-part graduate curriculum in statistical theory. Neither course strictly depends on the other.

Time and location

T/Th 9:30am - 11:00am
36-156

Outline

Intro to statistical inference, the central limit theorem and concentration inequalities
Conformal prediction and boostrap (valid inference with ML algorithms)
Statistical decision theory and learning theory framework
Minimax and Bayes optimality
Optimal hypothesis testing and Neyman–Pearson
Information-theoretic minimax lower bounds
Theory of regression and generalized linear models
M-estimation and maximum likelihood
Behavior of M-estimators and maximum likelihood with misspecified models
Optimality of maximum likelihood
Empirical Bayes and the James-Stein estimator
Multiple testing and the false discovery rate
Semiparametric inference and causality

This course is not about any one set of mathematical techniques, but is instead about understanding foundational statistical behavior. We will study this with exact calculations, asymptotic approximations, and concentration inequalities (finite-sample bounds).

Schedule

The course is split into four units.

Optimality I. This unit introduces statistical decision theory and notions of statistical optimality (in particular, Bayes optimality and minimax optimality). We show how the framework can be used to describe estimation, prediction, and hypothesis testing.
Inference I This unit discusses statistical inference — procedures that output a rigorous notion of confidence/uncertainty. We focus on confidence intervals and related outputs for now, giving algorithms such as bootstrap and conformal prediction that yield confidence intervals. We also discuss the relationship between confidence intervals and statistical testing.
Advanced Inference. This unit covers advanced topics in statistical inference: predictive inferenence and conformal prediction, multiple hypothesis testing, and empirical Bayes.
Asymptotic Inference and Optimality. This unit uses asymptotic calculations to analyze the behavior of estimators. This enables us to (i) establish (asymptotically) optimal estimators, (ii) give approximate confidence intervals, and (iii) to understand behavior under model misspecification.

Date	Unit	Topic	Resources
Thursday, September 5	Optimality 1	Statistical decision theory and learning theory framework, sufficiency, Rao-Blackwell theorem	Wass Ch 12, Ch 9.13, scribe notes
Tuesday, September 10	Optimality 1	Bayes optimality, Minimax optimality	Wass Ch 12, scribe notes
Thursday, September 12	Optimality 1	Admissibility, Bayes and minimax in Gaussian linear model	Wass Ch 12, Wass Ch 13, scribe notes
Tuesday, September 17	Optimality 1	Linear model cont., behavior under model misspecification	Wass Ch 13, scribe notes
Thursday, September 19	Optimality 1	Learning theory examples (statistical decision theory for prediction)	scribe notes, slides, See Wass Ch 4 & 5 for supplementary background
Tuesday, September 24	Optimality 1	Optimal hypothesis testing, Neyman–Pearson, TV distance	TSH 3.1, 3.2, scribe notes, See Wass Ch 10 for supplementary background
Thursday, September 26	Optimality 1	TV distance, composite testing with MLR	TSH 3.4, scribe notes
Tuesday, October 1	Optimality 1	Minimax lower bounds via reduction to testing	scribe notes, Le Cam's Method reference
Thursday, October 3	Inference 1	CLT and concentration, confidence intervals for means	scribe notes, VdV 2.1, Wass Ch 5. See Wass Ch 4 for background and TSH 11.4.3 for Bahadur-Savage result
Tuesday, October 8	No lecture	-	-
Thursday, October 11	Inference 1	Inference for CDFs, the DKW inequality	scribe notes, slides, Wass Ch 7
Tuesday, October 15	No lecture (student holiday)	-	-
Thursday, October 17	Inference 1	Bootstrap, inference via simulation	scribe notes, slides, Wass Ch 8
Tuesday, October 22	Inference 1	Hypothesis testing, p-values, testing-confidence interval duality	scribe notes, Wass 10.1, 10.2
Thursday, October 24	No lecture	-	-
Tuesday, October 29	-	in-class midterm exam	-
Thursday, October 31	Inference 1	p-values, permutation tests	scribe notes, Wass 10.5
Tuesday, November 5	Inference 2	Predictive inference, conformal prediction	scribe notes, Wass 13.4
Thursday, November 7	Inference 2	Conformal prediction	scribe notes, conformal book Ch 1 and Ch 3
Tuesday, November 12	Inference 2	Risk-control for predictive inference	scribe notes, paper
Thursday, November 14	Inference 2	Multiple testing: Bonferroni and FWER control	scribe notes, Wass 10.7
Tuesday, November 19	Inference 2	Multiple testing: Benjamini-Hochberg and FDR control	Wass 10.7, scribe notes
Thursday, November 21	Asymptotics	Intro, Uniform Tightness, Delta Method	VdV 2.1, 2.2, 3.1, scribe notes
Thursday, November 26	Asymptotics	Moment estimators, Exponential Families	VdV 4, scribe notes
Tuesday, Dec 3	Asymptotics	Moment estimators, M-estimators	VdV 5.1-5.2, scribe notes
Thursday, Dec 5	Asymptotics	M-estimators	VdV 5.3, 5.6, scribe notes
Tuesday, Dec 10	Asymptotics	M-estimators, optimality of maximum likelihood	VdV 5.5, 8.5, 8.6, scribe notes
Wed, Dec 18, 9-noon	-	Final Exam

Homework

Homework 1 is released on Canvas, due Thursday, September 19.
Homework 2 is released on Canvas, due Friday, September 27.
Homework 3 is released on Canvas, due Friday, October 4.
Homework 4 is released on Canvas, due Friday, October 11.
Homework 5 is released on Canvas, due Sunday, October 20.
Homework 6 is released on Canvas, due Sunday, October 27.
Homework 7 is released on Canvas, due Friday, Nov 8.
Homework 8 is released on Canvas, due Monday, Nov 18.
Homework 9 is released on Canvas, due Wednesday, Nov 25.
Homework 10 is released on Canvas, due Wednesday, Dec 11.

Course materials

This course will use course notes and selections from Asymptotic Statistics (“vdV”) as the primary course material. All of Statistics (“Wass”), Theory of Point Estimation (“TPE”), and Testing Statistical Hypotheses (“TSH”) will be used as supplementary texts. All are available online through the MIT library.

Logistics and grading

Assignments are submitted via Gradescope, and we have a Piazza discussion board.

The course will have weekly problem sets, a final exam, and a midterm exam. Enrolled students must submit latexed scribe notes for one lecture.

Final course grades are computed as 35% homework score + 35% max(midterm, final) + 25% min(midterm, final) + 5% scribe notes score.

Your lowest two problem set scores will be dropped. Your third lowest problem score can be dropped if you earned a score of least 50%. The point is that your grade is not sensitive to missing an assignment or having a poor score.

Office hours: see Piazza

Scribe notes

Scribe due 24 hours after the lecture for full credit (50% credit if within 48 hours), and these are graded. Please submit these on Gradescope.