6.S951: Modern Mathematical Statistics, fall 2024
Overview
Mathematical statistics provides a formal language for reasoning about data and
uncertainty. This course introduces the basic framework of statistical decision theory and learning theory. We focus on two themes. First, we will develop results for statistical inference — procedures that have mathematical guarantees about the degree of certainty in the output. Such results involve both mathematical derivations as well as conceptual considerations about what of outputs and guarantees are practically informative. We discuss causality, multiple hypothesis testing, nonparametric
and semiparametric statistics, and results for model misspecification.
The second theme is statistical optimality — what are the fundamental limits of certainty and precision given a limited amount of data, and which procedures achieve this? Here, we discuss concepts such as sufficiency, Bayes and minimax optimality of
statistical procedures, with applications to optimal estimation, hypothesis testing,
and prediction. In particular, we discuss the asymptotic optimality of maximum likelihood and information-theoretic lower bounds for estimation.
This is a graduate course targeted at students interested in statistical research, with an
emphasis on proofs and fundamental understanding.
We assume previous exposure to
undergraduate statistics, along with a strong undergraduate background in linear
algebra, probability, and real analysis. This course together with IDS.160[J] (which is taught in the spring) form a two-part graduate curriculum in statistical theory. Neither course strictly depends on the other.
Time and location
T/Th 9:30am - 11:00am
36-156
Outline
Intro to statistical inference, the central limit theorem and concentration inequalities
Conformal prediction and boostrap (valid inference with ML algorithms)
Statistical decision theory and learning theory framework
Minimax and Bayes optimality
Optimal hypothesis testing and Neyman–Pearson
Information-theoretic minimax lower bounds
Theory of regression and generalized linear models
M-estimation and maximum likelihood
Behavior of M-estimators and maximum likelihood with misspecified models
Optimality of maximum likelihood
Empirical Bayes and the James-Stein estimator
Multiple testing and the false discovery rate
Semiparametric inference and causality
This course is not about any one set of mathematical techniques, but is instead about understanding foundational statistical behavior. We will study this with exact calculations, asymptotic approximations, and concentration inequalities (finite-sample bounds).
Schedule
The course is split into four units.
Optimality I. This unit introduces statistical decision theory and notions of statistical optimality (in particular, Bayes optimality and minimax optimality). We show how the framework can be used to describe estimation, prediction, and hypothesis testing.
Inference I This unit discusses statistical inference — procedures that output a rigorous notion of confidence/uncertainty. We focus on confidence intervals and related outputs for now, giving algorithms such as bootstrap and conformal prediction that yield confidence intervals. We also discuss the relationship between confidence intervals and statistical testing.
Advanced Inference. This unit covers advanced topics in statistical inference: predictive inferenence and conformal prediction, multiple hypothesis testing, and empirical Bayes.
Asymptotic Inference and Optimality. This unit uses asymptotic calculations to analyze the behavior of estimators. This enables us to (i) establish (asymptotically) optimal estimators, (ii) give approximate confidence intervals, and (iii) to understand behavior under model misspecification.
Date | Unit | Topic | Resources |
Thursday, September 5 | Optimality 1 | Statistical decision theory and learning theory framework, sufficiency, Rao-Blackwell theorem | Wass Ch 12, Ch 9.13, scribe notes |
Tuesday, September 10 | Optimality 1 | Bayes optimality, Minimax optimality | Wass Ch 12, scribe notes |
Thursday, September 12 | Optimality 1 | Admissibility, Bayes and minimax in Gaussian linear model | Wass Ch 12, Wass Ch 13, scribe notes |
Tuesday, September 17 | Optimality 1 | Linear model cont., behavior under model misspecification | Wass Ch 13, scribe notes |
Thursday, September 19 | Optimality 1 | Learning theory examples (statistical decision theory for prediction) | scribe notes, slides, See Wass Ch 4 & 5 for supplementary background |
Tuesday, September 24 | Optimality 1 | Optimal hypothesis testing, Neyman–Pearson, TV distance | TSH 3.1, 3.2, scribe notes, See Wass Ch 10 for supplementary background |
Thursday, September 26 | Optimality 1 | TV distance, composite testing with MLR | TSH 3.4, scribe notes |
Tuesday, October 1 | Optimality 1 | Minimax lower bounds via reduction to testing | scribe notes, Le Cam's Method reference |
Thursday, October 3 | Inference 1 | CLT and concentration, confidence intervals for means | scribe notes, VdV 2.1, Wass Ch 5. See Wass Ch 4 for background and TSH 11.4.3 for Bahadur-Savage result |
Tuesday, October 8 | No lecture | - | - |
Thursday, October 11 | Inference 1 | Inference for CDFs, the DKW inequality | scribe notes, slides, Wass Ch 7 |
Tuesday, October 15 | No lecture (student holiday) | - | - |
Thursday, October 17 | Inference 1 | Bootstrap, inference via simulation | scribe notes, slides, Wass Ch 8 |
Tuesday, October 22 | Inference 1 | Hypothesis testing, p-values, testing-confidence interval duality | scribe notes, Wass 10.1, 10.2 |
Thursday, October 24 | No lecture | - | - |
Tuesday, October 29 | - | in-class midterm exam | - |
Thursday, October 31 | Inference 1 | p-values, permutation tests | scribe notes, Wass 10.5 |
Tuesday, November 5 | Inference 2 | Predictive inference, conformal prediction | scribe notes, Wass 13.4 |
Thursday, November 7 | Inference 2 | Conformal prediction | scribe notes, conformal book Ch 1 and Ch 3 |
Tuesday, November 12 | Inference 2 | Risk-control for predictive inference | scribe notes, paper |
Thursday, November 14 | Inference 2 | Multiple testing: Bonferroni and FWER control | scribe notes, Wass 10.7 |
Tuesday, November 19 | Inference 2 | Multiple testing: Benjamini-Hochberg and FDR control | Wass 10.7, scribe notes |
Thursday, November 21 | Asymptotics | Intro, Uniform Tightness, Delta Method | VdV 2.1, 2.2, 3.1, scribe notes |
Thursday, November 26 | Asymptotics | Moment estimators, Exponential Families | VdV 4, scribe notes |
Tuesday, Dec 3 | Asymptotics | Moment estimators, M-estimators | VdV 5.1-5.2, scribe notes |
Thursday, Dec 5 | Asymptotics | M-estimators | VdV 5.3, 5.6, scribe notes |
Tuesday, Dec 10 | Asymptotics | M-estimators, optimality of maximum likelihood | VdV 5.5, 8.5, 8.6, scribe notes |
Wed, Dec 18, 9-noon | - | Final Exam | |
|
Homework
Homework 1 is released on Canvas, due Thursday, September 19.
Homework 2 is released on Canvas, due Friday, September 27.
Homework 3 is released on Canvas, due Friday, October 4.
Homework 4 is released on Canvas, due Friday, October 11.
Homework 5 is released on Canvas, due Sunday, October 20.
Homework 6 is released on Canvas, due Sunday, October 27.
Homework 7 is released on Canvas, due Friday, Nov 8.
Homework 8 is released on Canvas, due Monday, Nov 18.
Homework 9 is released on Canvas, due Wednesday, Nov 25.
Homework 10 is released on Canvas, due Wednesday, Dec 11.
Course materials
This course will use course notes and selections from Asymptotic Statistics (“vdV”) as the primary course material. All of Statistics (“Wass”), Theory of Point Estimation (“TPE”),
and
Testing Statistical Hypotheses (“TSH”) will be used as supplementary texts. All are available online through the MIT library.
Logistics and grading
Assignments are submitted via Gradescope, and we have a Piazza discussion board.
The course will have weekly problem sets, a final exam, and a midterm exam. Enrolled students must submit latexed scribe notes for one lecture.
Final course grades are computed as 35% homework score + 35% max(midterm, final) + 25% min(midterm, final) + 5% scribe notes score.
Your lowest two problem set scores will be dropped. Your third lowest problem score can be dropped if you earned a score of least 50%. The point is that your grade is not sensitive to missing an assignment or having a poor score.
Office hours: see Piazza
Scribe notes
Scribe due 24 hours after the lecture for full credit (50% credit if within 48 hours), and these are graded. Please submit these on Gradescope.
|