Tim Kunisky - Fall 2025: Random Matrix Theory in Data Science and Statistics (EN.553.796)

Fall 2025: Random Matrix Theory in Data Science and Statistics (EN.553.796)

Course Description

This is a first course in random matrix theory, the study of the eigenvalues and eigenvectors of matrices with random entries that is foundational to high-dimensional statistics and data science. Aside from the main ideas and modern applications of random matrices, a key goal will be to introduce you to the main concepts of probability in high dimensions: concentration of measure, the geometry of high-dimensional spaces and convex sets, Gaussian measure, and sharp transitions and threshold phenomena. The following is a (very) tentative ordered list of specific topics to be covered:

1. Gaussian matrices and dimensionality reduction

Geometric method of analysis with concentration inequalities
Invariance and relation to random projection
Application: Johnson-Lindenstrauss transform and dimensionality reduction
Application: Compressed sensing

2. Classical theory of i.i.d. random matrices

Moment method for eigenvalue limit theorems
Semicircle law for Wigner matrices
Marchenko-Pastur law for Wishart matrices
Elements of universality
Elements of free probability theory
Application: Neural networks and random optimization landscapes
Application: Covariance estimation

3. Spiked matrix models and principal component analysis (PCA)

Resolvent method for eigenvalue limit theorems and further applications to eigenvectors
Baik—Ben Arous—Péché (BBP) phase transition in the performance of PCA
Computational challenges and statistical-computational gaps in PCA
Application: Non-linear improvements to PCA
Application: Community detection

4. Matrix concentration inequalities

General concentration inequalities and applications to random matrix norms (bounded differences, Lipschitz concentration, etc.)
General-purpose bounds on expected random matrix norms (matrix Chernoff, Bernstein, etc.)
Non-commutative Khintchine inequality and its exciting recent extensions
Application: Randomized numerical linear algebra

Contact & Office Hours

I am the instructor of this course, Tim Kunisky, and the teaching assistant is AMS PhD student Yue Wu.

The best way to contact us is by email, at kunisky [at] jhu.edu and ywu166 [at] jhu.edu, respectively. Our office hours are as follows:

TA: Mondays 3:30-4:30pm, Wyman S425 (tutorial room)
Me: Fridays 12:00-1:00pm, Wyman N438 (my office)

Schedule

Class will meet Tuesdays and Thursdays, 12:00pm to 1:15pm in Bloomberg 176.

Below is a tentative schedule, to be updated as the semester progresses.

Date	Details
Week 1
Aug 26	1. Course logistics. Random vector theory. Properties of Gaussian random vectors. Concentration inequalities.
Aug 28	2. Proof of concentration of Gaussian random vector norms. Consequences for how to think about high-dimensional geometry. Multiplying by a Gaussian matrix: what does it do? The Gaussian process viewpoint.
Week 2
Sep 2	3. Random matrices for dimensionality reduction: the Johnson-Lindenstrauss transform and lemma. "First moment method" proof technique using union bounds. Sketch of applications and related topics: faster variants, lower bounds. Extending the first moment method to uncountable problems.
Sep 4	4. Concentration of singular values of short fat matrices. Interpretation as a "matrix concentration" inequality. "Geometric method" for random matrix analysis. Discretizing matrix norms with epsilon nets. Non-constructive proof of existence of good epsilon nets.
Week 3
Sep 9	5. Application: compressed sensing with random sensing matrices and the null space and restricted isometry properties.
Sep 11	6. Singular vectors of Gaussian matrices. Algebraic method for almost sure distinctness of singular values and eigenvalues. Invariance of matrix distributions.
Week 4
Sep 16	7. Finish up singular vectors of Gaussian matrices. Haar measure on orthogonal groups and Stiefel manifolds. First steps towards eigenvalue and singular value limit theorems. Empirical description of limiting singular value distributions of rectangular matrices. Statistical implications for covariance estimation. Definition of convergence of empirical spectral distribution.
Sep 18	8. Weak convergence of random measures. The moment method for proving eigenvalue limit theorems. Carleman's criterion for distributions determined by moments. Review of moment method proof of central limit theorem.
Week 5
Sep 23	9. Finish moment method proof of central limit theorem. Start convergence of expected moments for Wigner semicircle theorem.
Sep 25	10. Finish convergence of expected moments for Wigner semicircle theorem. Definition and interpretations of Catalan numbers. Sketch of extension to stronger modes of convergence.
Week 6
Sep 30	11. Extensions and further discussion of semicircle limit theorem. Universality and failure of universality. Controlling extreme eigenvalues with the combinatorial method. Sample covariance matrices and the Marchenko-Pastur limit theorem.
Oct 2	12. Application: bounding bisections in random graphs. Introduction to free probability. Expanding traces of sums and the problem of tangled traces.
Week 7
Oct 7	13. Definition of asymptotic freeness. Additive free convolution via moment formulas. Stieltjes transform and \( R \)-transform computations.
Oct 9	14. Sufficient conditions for asymptotic freeness. Application: heuristic analysis of landscapes and eigenvalues of Hessians of neural networks.
Week 8
Oct 14	15. Finish neural networks application. Multiplicative free convolution. \( S \)-transform computations.
Oct 16	Fall Break: No class.
Week 9
Oct 21	16. Review of transform methods. Application: covariance estimation by free deconvolution. Sketch of alternative proof of semicircle limit theorem using resolvent and Cauchy transform.
Oct 23	17. Phase transition in spiked matrix models. Statement of BBP transition theorem and derivation using more resolvent and Cauchy transform methods.
Week 10
Oct 28	18. Proof of eigenvector analysis. Extensions: higher rank, spiked covariance model. Application: community detection in dense stochastic block model. Application: improving PCA by nonlinear preprocessing.
Oct 30	19. Finish nonlinear PCA. Introduction to goals of non-asymptotic theory and use of concentration inequalities for random matrix norms.
Week 11
Nov 4	20. Variance inequalities for random matrices: Efron-Stein, bounded differences, and Poincare inequalities.
Nov 6	21. Subgaussian concentration inequalities for random matrices: McDiarmid inequality, log-Sobolev inequalities, and Gaussian Lipschitz concentration. Applications of Gaussian Lipschitz concentration.
Week 12
Nov 11	22. Gaussian series models. Concentration of norm. Location of norm and the non-commutative Khintchine inequality.
Nov 13	23. Finish proof of non-commutative Khintchine inequality. Extensions via Gaussian-to-Rademacher and symmetrization tricks to matrix Chernoff inequality.
Week 13
Nov 18	24. Matrix Bernstein inequality. Applications: covariance estimation and numerical linear algebra.
Nov 20	25. Applications: Random graphs, Laplacians, and spectral sparsification.
Fall Recess: Nov 24–28
Week 14
Dec 2	26. Recent developments in non-asymptotic random matrix theory: improved non-commutative Khintchine inequalities and free probability connections.
Dec 4	27. Recent developments continued: universality and a nascent unified non-asymptotic theory.
Final Exam Period
Dec 11	Deadline for final presentation slides: 11:59pm.
Dec 12	Final presentations: Bloomberg 176, 9am-12pm.
Dec 18	Deadline for final project reports: 11:59pm.

Lecture Notes and Materials

You do not need to buy any books for this course. You can find my lecture notes here. If you want to look ahead to future topics, you can look at last year's notes here, but be advised that the topics covered this year might differ slightly. If you notice typos in either set of notes, please let me know.

The following are books or lecture notes that cover some similar material and might be useful to you in addition to my notes:

Grading

Grades will be based on a small number of written homework assignments, class participation, and a final project concerning a recent research paper, open problem, or topic of interest related to the material we cover.

Policies on Assignments and Collaboration

The following are the policies for submitted work in this course:

Collaboration: You are welcome to discuss homework with your classmates and instructors, but you must write up your own solutions, alone, in your own words. Students found submitting verbatim identical solutions will be penalized. If you have discussed the homework with anybody other than instructors, please list their names at the top of your submission.
Sharing solutions: You may not share full solutions to homework problems with your classmates, show others your written solutions and ask if they are correct, or take notes on or pictures of other students' work before preparing your own solutions.
AI assistants: You are welcome to use AI assistants (ChatGPT, Gemini, Claude, etc.) to explore the topics discussed in lecture, to clarify any general points of confusion, and to ask broad clarifying questions while doing your homework. You are not allowed to ask them directly for the solutions to homework problems. If you discuss your homework with an AI assistant, describe your interaction at the top of your submission. You are allowed to interact with AI assistants on your homework in the same way you are allowed to with other students: you may discuss problems, but may not ask for or copy complete solutions. If in doubt, you should make sure that you would be able to explain your solution to a homework problem on the board with no other references available.
Late submissions: You may use a total of five late days for homework submissions over the course of the semester without penalty. If you need an extension beyond these, you must ask me 48 hours before the due date of the homework and have an excellent reason. After you have used up these late days, further late assignments will be penalized by 20% per day they are late (that is, your maximum score after one late day will be 80%, after two late days 60%, and so forth). The final project must be submitted on time and no extensions for it are allowed.

Assignments

Homework will be posted here, and is to be submitted through Gradescope (see Canvas announcements for details). Please try to talk to me in advance if you need more time for an assignment.

Assigned	Due	Link
Sep 8	Sep 22	Assignment 1
Sep 29	Oct 20	Assignment 2
Oct 21	Nov 5	Assignment 3
Nov 7	Dec 1	Assignment 4
---	Dec 11	Final Project Presentation Slides
---	Dec 18	Final Project Reports

Final Project

Your final project is to do one of the following on a topic related to the content of this course: (1) read and digest a paper and present its content in your own words and style, with some elaboration that is not present in the original paper; (2) perform an interesting computational experiment motivated by something we have seen in class or something you read in a paper and report in detail on the results and their interpretation; or (3) for the intrepid, find an open problem related to something we have seen in class, try to work on it, and report your findings.

The project submission will have two parts:

A short in-class presentation when we meet on December 12 in Bloomberg 176 from 9am-12pm. Your presentation should last about 10 minutes, plus a bit of time at the end to take questions from me and the other students. You may present on the blackboard or with slides. If you present with slides, you must send me your slides before the day of presentations, by 11:59pm on December 11.
A short written report, due by the end of the day on December 18. I absolutely cannot grant extensions on this deadline. Your report should be at least 3 and no more than 6 pages long (not including any figures), summarizing what you learned together with any necessary background information. The report is to be prepared in LaTeX, formatted in one single-spaced column with size 12 font. If in doubt, use my lecture notes as a reference for acceptable formatting.

The following are reasonable categories from which to choose a topic, along with a few references you might look into. You are also welcome to choose a topic of your own, provided that you describe it and its relevance to the course convincingly on Assignment 2.

Extensions of and details on the Johnson-Lindenstrauss transform
Random matrices in low-rank matrix approximation and dictionary learning
Derandomizing free probability and associated universality phenomena
Finite and/or computational free probability
Statistical physics and the replica trick in random matrix theory
Tracy-Widom limit theorem and/or superconcentration of random matrix norms
Computational and statistical properties of PCA
PCA with structured noise
Norms of structured random matrices
Random matrices for analyzing random optimization landscapes with the Kac-Rice formula
Random matrix theory for random graphs
Matrix concentration inequalities