Fall 2024: Random Matrix Theory in Data Science and Statistics (EN.553.796)

Course Description

This is a first course in random matrix theory, the study of the eigenvalues and eigenvectors of matrices with random entries that is foundational to high-dimensional statistics and data science. Aside from the main ideas and modern applications of random matrices, a key goal will be to introduce you to the main concepts of probability in high dimensions: concentration of measure, the geometry of high-dimensional spaces and convex sets, Gaussian measure, and sharp transitions and threshold phenomena. The following is a (very) tentative ordered list of specific topics to be covered:

1. Gaussian matrices and dimensionality reduction

  • Direct analysis with concentration inequalities
  • Invariance and relation to random projection
  • Application: Johnson-Lindenstrauss transform and dimensionality reduction
  • Application: Compressed sensing

2. Classical theory of i.i.d. random matrices

  • Moment method for eigenvalue limit theorems
  • Semicircle law for Wigner matrices
  • Marchenko-Pastur law for Wishart matrices
  • Elements of universality
  • Elements of free probability theory
  • Application: Neural networks and random optimization landscapes
  • Application: Subsampling in frame and coding theory

3. Spiked matrix models and principal component analysis (PCA)

  • BBP phase transition in the performance of PCA
  • Application: Community detection
  • General covariance estimation, eigenvalue shrinkage, and free deconvolution
  • PCA variants, computational challenges, and statistical-computational gaps

4. Matrix concentration inequalities

  • General concentration inequalities and applications to random matrix norms (bounded differences, Lipschitz concentration, etc.)
  • General-purpose bounds on expected random matrix norms (matrix Chernoff, Bernstein, etc.)
  • Application: Spectral sparsification of graphs
  • Non-commutative Khintchine inequality and its exciting recent extensions
Contact & Office Hours

I am the instructor of this course, Tim Kunisky, and the teaching assistant is AMS PhD student Yue Wu.

The best way to contact us is by email, at kunisky [at] jhu.edu and ywu166 [at] jhu.edu, respectively. We will decide a time to hold office hours in the first few weeks of class; in the meantime, please email me to schedule an appointment if you have anything pressing to discuss.

Schedule

Class will meet Tuesdays and Thursdays, 9:00am to 10:15am in Gilman 55.

Below is a tentative schedule, to be updated as the semester progresses.

Date Details
Week 1
Aug 27 Course logistics. Review of PCA and SVD as exploratory data analysis tools. Eckart-Young-Mirsky theorem. Multiplying by a Gaussian matrix: what does it do? What is it good for? How to think about Gaussian processes.
Aug 29 Random matrices for dimensionality reduction: the Johnson-Lindenstrauss transform and lemma. Concentration inequalities and union bound arguments.
Week 2
Sep 3 Application of Johnson-Lindenstrauss to nearest neighbors. Ailon and Chazelle's fast Johnson-Lindenstrauss transform. Connection between Gaussian matrices and random projection. Uniformity of singular vectors.
Sep 5 Concentration of singular values of short wide matrices. Interpretation as a "matrix concentration" inequality. Epsilon net arguments for discretizing matrix norms.
Week 3
Sep 10 Non-constructive existence of epsilon nets over unit sphere. Finish singular values of short wide matrices.
Sep 12 Application: compressed sensing with random sensing matrices and the restricted isometry property. First steps of limit theorems: what does convergence of empirical spectral distribution mean? Statistical meaning of Wishart matrix limit theorems.
Week 4
Sep 17 Statement of semicircle and Marchenko-Pastur limit theorems. Moment method for proving limit theorems. Review of central limit theorem. First steps of Wigner's semicircle limit theorem.
Sep 19 Finishing Wigner's semicircle limit theorem. Sketch of Marchenko-Pastur limit theorem. Controlling matrix norms and extreme eigenvalues by moments.
Lecture Notes and Materials

You do not need to buy any books for this course. You can find my lecture notes here, which will be updated as the semester proceeds.

The following are freely available books or lecture notes that cover some similar material and might be useful to you in addition to my notes:

Grading

Grades will be based on a small number of written homework assignments, class participation, and a final project concerning a recent research paper, open problem, or topic of interest related to the material we cover.

Assignments

Homework will be posted here. Submission instructions to follow soon. Please try to talk to me in advance if you need more time for an assignment.

Assigned Due Link
Sep 12 Sep 30 Assignment 1