Fall 2025: Random Matrix Theory in Data Science and Statistics (EN.553.796)

Course Description

This is a first course in random matrix theory, the study of the eigenvalues and eigenvectors of matrices with random entries that is foundational to high-dimensional statistics and data science. Aside from the main ideas and modern applications of random matrices, a key goal will be to introduce you to the main concepts of probability in high dimensions: concentration of measure, the geometry of high-dimensional spaces and convex sets, Gaussian measure, and sharp transitions and threshold phenomena. The following is a (very) tentative ordered list of specific topics to be covered:

1. Gaussian matrices and dimensionality reduction

  • Geometric method of analysis with concentration inequalities
  • Invariance and relation to random projection
  • Application: Johnson-Lindenstrauss transform and dimensionality reduction
  • Application: Compressed sensing

2. Classical theory of i.i.d. random matrices

  • Moment method for eigenvalue limit theorems
  • Semicircle law for Wigner matrices
  • Marchenko-Pastur law for Wishart matrices
  • Elements of universality
  • Elements of free probability theory
  • Application: Neural networks and random optimization landscapes
  • Application: Covariance estimation

3. Spiked matrix models and principal component analysis (PCA)

  • Resolvent method for eigenvalue limit theorems and further applications to eigenvectors
  • Baik—Ben Arous—Péché (BBP) phase transition in the performance of PCA
  • Computational challenges and statistical-computational gaps in PCA
  • Application: Non-linear improvements to PCA
  • Application: Community detection

4. Matrix concentration inequalities

  • General concentration inequalities and applications to random matrix norms (bounded differences, Lipschitz concentration, etc.)
  • General-purpose bounds on expected random matrix norms (matrix Chernoff, Bernstein, etc.)
  • Non-commutative Khintchine inequality and its exciting recent extensions
  • Application: Randomized numerical linear algebra
Contact & Office Hours

I am the instructor of this course, Tim Kunisky, and the teaching assistant is AMS PhD student Yue Wu.

The best way to contact us is by email, at kunisky [at] jhu.edu and ywu166 [at] jhu.edu, respectively. Our office hours are as follows:

  • TA: Mondays 3:30-4:30pm, Wyman S425 (tutorial room)
  • Me: Fridays 12:00-1:00pm, Wyman N438 (my office)
Schedule

Class will meet Tuesdays and Thursdays, 12:00pm to 1:15pm in Bloomberg 176.

Below is a tentative schedule, to be updated as the semester progresses.

Date Details
Week 1
Aug 26 1. Course logistics. Random vector theory. Properties of Gaussian random vectors. Concentration inequalities.
Aug 28 Proof of concentration of Gaussian random vector norms. Consequences for how to think about high-dimensional geometry. Multiplying by a Gaussian matrix: what does it do? The Gaussian process viewpoint.
Week 2
Sep 2 3. Random matrices for dimensionality reduction: the Johnson-Lindenstrauss transform and lemma. "First moment method" proof technique using union bounds. Sketch of applications and related topics: faster variants, lower bounds. Extending the first moment method to uncountable problems.
Sep 4 4. Concentration of singular values of short fat matrices. Interpretation as a "matrix concentration" inequality. "Geometric method" for random matrix analysis. Discretizing matrix norms with epsilon nets. Non-constructive proof of existence of good epsilon nets.
Week 3
Sep 9 5. Application: compressed sensing with random sensing matrices and the null space and restricted isometry properties.
Sep 11 6. Singular vectors of Gaussian matrices. Algebraic method for almost sure distinctness of singular values and eigenvalues. Invariance of matrix distributions.
Week 4
Sep 16 7. Finish up singular vectors of Gaussian matrices. Haar measure on orthogonal groups and Stiefel manifolds. First steps towards eigenvalue and singular value limit theorems. Empirical description of limiting singular value distributions of rectangular matrices. Statistical implications for covariance estimation. Definition of convergence of empirical spectral distribution.
Sep 18 8. Weak convergence of random measures. The moment method for proving eigenvalue limit theorems. Carleman's criterion for distributions determined by moments. Review of moment method proof of central limit theorem.
Week 5
Sep 23 9. Finish moment method proof of central limit theorem. Start convergence of expected moments for Wigner semicircle theorem.
Sep 25 10. Finish convergence of expected moments for Wigner semicircle theorem. Definition and interpretations of Catalan numbers. Sketch of extension to stronger modes of convergence.
Week 6
Sep 30 11. Extensions and further discussion of semicircle limit theorem. Universality and failure of universality. Controlling extreme eigenvalues with the combinatorial method. Sample covariance matrices and the Marchenko-Pastur limit theorem.
Oct 2 12. Introduction to free probability. Empirical evidence of additive free convolution. Wigner's semicircle limit theorem as the "random matrix central limit theorem." Expanding traces of sums and the problem of tangled traces.
Lecture Notes and Materials

You do not need to buy any books for this course. You can find my lecture notes here. If you want to look ahead to future topics, you can look at last year's notes here, but be advised that the topics covered this year might differ slightly. If you notice typos in either set of notes, please let me know.

The following are books or lecture notes that cover some similar material and might be useful to you in addition to my notes:

Grading

Grades will be based on a small number of written homework assignments, class participation, and a final project concerning a recent research paper, open problem, or topic of interest related to the material we cover.

Policies on Assignments and Collaboration

The following are the policies for submitted work in this course:

  • Collaboration: You are welcome to discuss homework with your classmates and instructors, but you must write up your own solutions, alone, in your own words. Students found submitting verbatim identical solutions will be penalized. If you have discussed the homework with anybody other than instructors, please list their names at the top of your submission.
  • Sharing solutions: You may not share full solutions to homework problems with your classmates, show others your written solutions and ask if they are correct, or take notes on or pictures of other students' work before preparing your own solutions.
  • AI assistants: You are welcome to use AI assistants (ChatGPT, Gemini, Claude, etc.) to explore the topics discussed in lecture, to clarify any general points of confusion, and to ask broad clarifying questions while doing your homework. You are not allowed to ask them directly for the solutions to homework problems. If you discuss your homework with an AI assistant, describe your interaction at the top of your submission. You are allowed to interact with AI assistants on your homework in the same way you are allowed to with other students: you may discuss problems, but may not ask for or copy complete solutions. If in doubt, you should make sure that you would be able to explain your solution to a homework problem on the board with no other references available.
  • Late submissions: You may use a total of five late days for homework submissions over the course of the semester without penalty. If you need an extension beyond these, you must ask me 48 hours before the due date of the homework and have an excellent reason. After you have used up these late days, further late assignments will be penalized by 20% per day they are late (that is, your maximum score after one late day will be 80%, after two late days 60%, and so forth). The final project must be submitted on time and no extensions for it are allowed.
Assignments

Homework will be posted here, and is to be submitted through Gradescope (see Canvas announcements for details). Please try to talk to me in advance if you need more time for an assignment.

Assigned Due Link
Sep 8 Sep 22 Assignment 1
Sep 29 Oct 16 Assignment 2
Final Project

Your final project is to do one of the following on a topic related to the content of this course: (1) read and digest a paper and present its content in your own words and style, with some elaboration that is not present in the original paper; (2) perform an interesting computational experiment motivated by something we have seen in class or something you read in a paper and report in detail on the results and their interpretation; or (3) for the intrepid, find an open problem related to something we have seen in class, try to work on it, and report your findings.

The project submission will have two parts:

  • A short in-class presentation during the final exam period.
  • A short written report, also due during the final exam period.

Further details on both components will be posted later in the semester.

The following are reasonable categories from which to choose a topic, along with a few references you might look into. You are also welcome to choose a topic of your own, provided that you describe it and its relevance to the course convincingly on Assignment 2.