% Template based on:
% https://www.cs.columbia.edu/~djhsu/coms4774-s21/scribe.html
\documentclass[12pt]{article}
\usepackage{amscd}
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage{amstext}
\usepackage{amsthm}
\usepackage{bbold}
\usepackage{bm}
\usepackage{colonequals}
\usepackage[dvips,letterpaper,margin=1in]{geometry}
\usepackage{mathtools}
\usepackage{graphicx}
\usepackage{xcolor}
\usepackage{csquotes}
\MakeOuterQuote{"}
\definecolor{cblack}{rgb}{0,0,0}
\definecolor{cblue}{rgb}{0.121569,0.466667,0.705882}
\definecolor{corange}{rgb}{1.000000,0.498039,0.054902}
\definecolor{cgreen}{rgb}{0.172549,0.627451,0.172549}
\definecolor{cred}{rgb}{0.839216,0.152941,0.156863}
\definecolor{cpurple}{rgb}{0.580392,0.403922,0.741176}
\definecolor{cbrown}{rgb}{0.549020,0.337255,0.294118}
\definecolor{cpink}{rgb}{0.890196,0.466667,0.760784}
\definecolor{cgray}{rgb}{0.498039,0.498039,0.498039}
\usepackage{hyperref}
\hypersetup{
linkcolor = cblue,
citecolor = cgreen,
urlcolor = corange,
colorlinks = true,
}
% Lecture information
\newcommand\coursename{CPSC 664, Spring 2023}
\newcommand\scribe{Peixin You}
\newcommand\lecturer{Tim Kunisky}
\newcommand\lecturedate{Feb. 7, 2023}
\newcommand\lecturetitle{Lecture 7: Moment Problems}
% Define theorem environments here.
\newtheorem{claim}{Claim}
\newtheorem{theorem}{Theorem}[section]
\newtheorem{remark}[theorem]{Remark}
\newtheorem{assumption}[theorem]{Assumption}
\newtheorem{lemma}[theorem]{Lemma}
\newtheorem{question}[theorem]{Question}
\newtheorem{problem}[theorem]{Problem}
\newtheorem{definition}[theorem]{Definition}
\newtheorem{example}[theorem]{Example}
\newtheorem{proposition}[theorem]{Proposition}
\newtheorem{corollary}[theorem]{Corollary}
\newtheorem{conjecture}[theorem]{Conjecture}
% Some of my macros
\renewcommand{\AA}{\mathbb{A}}
\newcommand{\CC}{\mathbb{C}}
\newcommand{\DD}{\mathbb{D}}
\newcommand{\EE}{\mathbb{E}}
\newcommand{\FF}{\mathbb{F}}
\newcommand{\HH}{\mathbb{H}}
\newcommand{\NN}{\mathbb{N}}
\newcommand{\PP}{\mathbb{P}}
\newcommand{\QQ}{\mathbb{Q}}
\newcommand{\RR}{\mathbb{R}}
\renewcommand{\SS}{\mathbb{S}}
\newcommand{\ZZ}{\mathbb{Z}}
\DeclareSymbolFont{bbold}{U}{bbold}{m}{n}
\DeclareSymbolFontAlphabet{\mathbbold}{bbold}
\newcommand{\One}{\mathbbold{1}}
\newcommand{\ba}{\bm a}
\newcommand{\bb}{\bm b}
\newcommand{\bc}{\bm c}
\newcommand{\bd}{\bm d}
\newcommand{\be}{\bm e}
\newcommand{\bg}{\bm g}
\newcommand{\bh}{\bm h}
\newcommand{\bi}{\bm i}
\newcommand{\bj}{\bm j}
\newcommand{\bk}{\bm k}
\newcommand{\bbm}{\bm m}
\newcommand{\bp}{\bm p}
\newcommand{\bq}{\bm q}
\newcommand{\br}{\bm r}
\newcommand{\bs}{\bm s}
\newcommand{\bt}{\bm t}
\newcommand{\bu}{\bm u}
\newcommand{\bv}{\bm v}
\newcommand{\bw}{\bm w}
\newcommand{\bx}{\bm x}
\newcommand{\by}{\bm y}
\newcommand{\bz}{\bm z}
\newcommand{\bA}{\bm A}
\newcommand{\bB}{\bm B}
\newcommand{\bC}{\bm C}
\newcommand{\bD}{\bm D}
\newcommand{\bE}{\bm E}
\newcommand{\bF}{\bm F}
\newcommand{\bG}{\bm G}
\newcommand{\bH}{\bm H}
\newcommand{\bI}{\bm I}
\newcommand{\bL}{\bm L}
\newcommand{\bM}{\bm M}
\newcommand{\bN}{\bm N}
\newcommand{\bP}{\bm P}
\newcommand{\bQ}{\bm Q}
\newcommand{\bR}{\bm R}
\newcommand{\bS}{\bm S}
\newcommand{\bT}{\bm T}
\newcommand{\bU}{\bm U}
\newcommand{\bV}{\bm V}
\newcommand{\bW}{\bm W}
\newcommand{\bX}{\bm X}
\newcommand{\bY}{\bm Y}
\newcommand{\bZ}{\bm Z}
\newcommand{\zero}{\bm{0}}
\newcommand{\one}{\bm{1}}
\newcommand{\sA}{\mathcal{A}}
\newcommand{\sB}{\mathcal{B}}
\newcommand{\sC}{\mathcal{C}}
\newcommand{\sD}{\mathcal{D}}
\newcommand{\sE}{\mathcal{E}}
\newcommand{\sF}{\mathcal{F}}
\newcommand{\sG}{\mathcal{G}}
\newcommand{\sH}{\mathcal{H}}
\newcommand{\sI}{\mathcal{I}}
\newcommand{\sJ}{\mathcal{J}}
\newcommand{\sK}{\mathcal{K}}
\newcommand{\sL}{\mathcal{L}}
\newcommand{\sM}{\mathcal{M}}
\newcommand{\sN}{\mathcal{N}}
\newcommand{\sO}{\mathcal{O}}
\newcommand{\sP}{\mathcal{P}}
\newcommand{\sQ}{\mathcal{Q}}
\newcommand{\sR}{\mathcal{R}}
\newcommand{\sS}{\mathcal{S}}
\newcommand{\sT}{\mathcal{T}}
\newcommand{\sU}{\mathcal{U}}
\newcommand{\sV}{\mathcal{V}}
\newcommand{\sX}{\mathcal{X}}
\newcommand{\sY}{\mathcal{Y}}
\newcommand{\fB}{\mathscr{B}}
\newcommand{\fC}{\mathscr{C}}
\newcommand{\fE}{\mathscr{E}}
\newcommand{\fH}{\mathscr{H}}
\newcommand{\fI}{\mathscr{I}}
\newcommand{\fP}{\mathscr{P}}
\newcommand{\fV}{\mathscr{V}}
\DeclareSymbolFont{sfoperators}{OT1}{cmss}{m}{n}
% don't waste a math group
\DeclareSymbolFontAlphabet{\mathsf}{sfoperators}
% tell LaTeX to use sfoperators for names of operators
\makeatletter
\renewcommand{\operator@font}{\mathgroup\symsfoperators}
\makeatother
\DeclareMathOperator*{\argmax}{arg\,max}
\DeclareMathOperator*{\argmin}{arg\,min}
\DeclareMathOperator{\GOE}{GOE}
\DeclareMathOperator{\sym}{sym}
\DeclareMathOperator{\Part}{Part}
\DeclareMathOperator{\poly}{poly}
\DeclareMathOperator{\Unif}{Unif}
\DeclareMathOperator{\Var}{Var}
\DeclareMathOperator{\Cov}{Cov}
\DeclareMathOperator{\Tr}{Tr}
\DeclareMathOperator{\rank}{rank}
\DeclareMathOperator{\obj}{obj}
\DeclareMathOperator{\diag}{diag}
\DeclareMathOperator{\sgn}{sgn}
\renewcommand\Re{\operatorname{Re}}
\renewcommand\Im{\operatorname{Im}}
% These put symbols below.
\newcommand{\Px}{\mathop{\mathbb{P}}}
\newcommand{\Ex}{\mathop{\mathbb{E}}}
\newcommand{\Varx}{\mathop{\mathsf{Var}}}
\newcommand\numberthis{\addtocounter{equation}{1}\tag{\theequation}}
%
% Put your custom macros here if you want.
%
\begin{document}
% This creates the header for scribe notes.
\noindent \coursename \hfill Scribe: \scribe \\
\lecturedate \hfill Lecturer: \lecturer
\vspace{1em}
\hrule
\vspace{1.5em}
\begin{center}
{\Large\lecturetitle}
\end{center}
\section{Problem Statement}
This is a very classical topic in probability theory. In general, this asks when knowing the moments of a probability distribution or limits of moments of a sequence of distributions will help us identify the underlying distribution or its limit.
Formally,
\begin{enumerate}
\item The simplest version of this kind of question is: for a given $\{m_k\}_{k\geq 1}$ we want to know the \textbf{existence}: if there exists some probability measure $\mu$ over $\RR$ such that $$\Ex_{x\sim \mu} X^k = m_k, \forall k$$
We quickly recall that, in a more analysis notation, we could also write $\Ex_{x\sim \mu} X^k$ as $\int x^k d \mu(x) $. When $\mu$ has a density that we denote as $p$, we could rewrite the integral to $\int x^k p(x) dx$.
After asking if there exists such a $\mu$, another natural question we would like to answer is \textbf{uniqueness:} when such $\mu$ exists, will it be unique or not?
\item We have seen a more advanced question in the number partitioning lecture: given random variables $X_n$ and for every fixed $k$ the moment $$\lim_{n\to \infty} \EE X_n^k\to \int x^k d\mu(x)$$ then the question is when we could conclude $X_n\to \mu$ in some other sense of convergence.
\item At the end of this lecture, we want to discuss the appearances in random matrix theory of convergence in moments.
\end{enumerate}
\section{Existence and uniqueness}
So let's start with the simplest version:
Given the sequence of moments of some $\mu$, $m_k = \int x^k d\mu(x)$. We want to know how and when we could recover $\mu$ from the given information. Part of this question is like an algorithmic question when we give you these numbers, how do we calculate $\mu$? More abstractly, we want to clarify if there exists a $\mu'\neq \mu$ such that $m_k = \int x^k d\mu'(x)$.
To make our life easier, we assume our $\mu$ has density function $p(x)$; this allows us to compute $\mu$ by integrating $p$.
Now we first introduce \emph{characteristic function}.
\begin{definition}
\emph{Characteristic function} $\Phi(t)\in\CC$ is defined as
$$\Phi(t)=\Ex_{X\sim \mu} e^{itX}=\int_{-\infty}^\infty e^{itx}p(x) dx.$$
\end{definition}
\begin{example}
The characteristic function of the Gaussian law $\sN(\mu, \sigma^2)$ is $\exp{(i\mu t - \frac{1}{2}\sigma^2t^2)}$.
The characteristic function of the exponential law $\operatorname{Exp}(\lambda)$, which we saw appear in the number partitioning lecture, is $\frac{1}{1-\frac{i}{\lambda}t}$.
\end{example}
\begin{remark}
One way to think of this is just like a moment-generating function but evaluated with complex parameters.
\end{remark}
\begin{theorem}
\label{thm:1}
Given a $\Phi$ with some nice properties, for instance $\Phi$ is in $\sL^1$, we have $$p(x)=\frac{1}{2\pi}\int_{-\infty}^\infty e^{itx}\Phi(t)dt.$$
\end{theorem}
\begin{proof}
Notice that $$RHS = \frac{1}{2\pi}\int_{-\infty}^\infty \int_{-\infty}^\infty e^{it(y-x)}p(y)dy dt.$$ The idea of calculating this integral is to integrate $e^{it(y-x)}$ as a delta function.
To prove this more formally, we can show the result if $p$ is some Gaussian, and then approximate other $p$ by some linear combination of Gaussian densities. Then we will use that the map $p\mapsto \Phi$, and the inversion map are both linear. So we can pass the linear combination through the integral and make some arguments.
\end{proof}
Now we see if we know $\Phi(t)$ on $t\in\RR$, usually it's enough to recover $p$.
We would like to move to the next question now: how do the moments determine $\Phi$? The main idea of solving this question is using Taylor expansion.
Now let's calculate the derivatives of $\Phi$:
$$\Phi^{(k)}(t):=\frac{d^k \Phi}{dt^k} = \frac{d^k}{dt^k}\int_{-\infty}^\infty e^{itx}p(x) dx = \int_{-\infty}^\infty e^{itx}(ix)^k p(x) dx.$$
At $t = 0$, we have $\Phi^{(k)}(0)=i^km_k$ so the Taylor series will be
$$\sum_{k=0}^\infty \frac{i^km_k}{k!}t^k.$$
In the best case, we hope the Taylor series has infinite radius of convergence. So the series will give us the characteristic function, which in turn gives us $\mu$.
If $\frac{|m_k|^{1/k}}{k}\to 0$ then $$\left|\frac{m_k}{k!}\right|\leq \frac{|m_k|}{(k/e)^k}\leq \left(\frac{|m_k|^{1/k}e}{k}\right)^k\leq \varepsilon^k$$ for every sufficiently large $k$ and for all $\varepsilon>0$. Then we know the Taylor series $$\Phi(t) = \sum_{k=0}^\infty \frac{i^km_k}{k!}t^k$$ converges for all $t$, so we could recover $p(x)$.
It is also possible to weaken this condition.
If $\frac{|m_k|^{1/k}}{k}$ is bounded, or equivalently, $|m_k|\leq CM^kk!$ where $C, M$ are constant, then we have
$$\left|\frac{m_k}{k!}\right|\leq \frac{|m_k|}{(k/e)^k}\leq \left(\frac{|m_k|^{1/k}e}{k}\right)^k\leq \Theta\left(\frac{1}{M}\right)^k$$ for every sufficient large $k$.
This gives convergence on $\left(-\Theta\left(\frac{1}{M}\right), \Theta\left(\frac{1}{M}\right)\right)$.
Now let's look at the Taylor series at other center $t_0$: $$\sum_{k=0}^\infty \frac{\Phi^{(k)}(t_0)}{k!}(t-t_0)^k.$$
Notice that $$\left| \Phi^{(k)}(t)\right| \leq \int_{-\infty}^\infty |x|^k p(x)dx$$ which is like the moment $m_k$ but with an absolute value. We denote it as $\tilde{m}_k$. We can bound these by the even moments $m_{2\ell}$.
\begin{remark}
We have:
\begin{align*}
\Tilde{m}_{2k} &= m_{2k} \\
\Tilde{m}_{2k+1} &\leq m_{2k+2}^{\frac{2k+1}{2k+2}}
\end{align*}
With the inequality above, we have
\begin{align*}
\Tilde{m}^{1/(2k)}_{2k} &\leq m^{1/(2k)}_{2k} \\
\Tilde{m}^{1/(2k+1)}_{2k+1} &\leq m_{2k+2}^{1/(2k+2)}
\end{align*}
This means if the $m_k$ is bound, that is actually what we assumed, then the absolute value will behave the same.
\end{remark}
\noindent
We then find that the Taylor series converges on a ball of radius $\Theta(1/M)$ around any $t_0 \in\RR$.
A complex analysis argument using analytic continuation then shows that the moments determine the characteristic function on $\RR$.
\begin{theorem}
\label{thm:3}
If $|m_k|\leq CM^kk!$ for some constant $C, M$, then $\mu$ is determined uniquely by the moments $m_k$.
\end{theorem}
Let's see some examples on which this theorem works.
\begin{example}
Theorem~\ref{thm:3} applies to any compactly supported distribution, and Gaussian distribution, any sub-Gaussian distribution, any exponential distribution, and any Poisson distribution.
\end{example}
Also, let's see one example that doesn't satisfy the condition of Theorem~\ref{thm:3}.
\begin{example}[Log-normal distribution] $\mu = e^{\sN(0, 1)}$ having density $p(x)$. In this case, $m_k = e^{k^2/2}$ which is not good, since what we allowed is $k!$ which is $k^{O(k)}$ or $e^{O(k\log{k})}$. Actually, we can check that we have the same moments for our distribution with density $p(x)$ and another distribution with density proportional to $p(x)(1 + \delta \sin{(2\pi\log{x})})$ for small $\delta > 0$.
\end{example}
\subsection{Moment sequences}
Let's look at one interesting side question.
Before, we had $m_k$, and we were promised that they are the moment of some distribution and tried to find that distribution. Now we want to go one step further; suppose $m_k \in \RR$ are arbitrary numbers. We want to ask if there exists any distribution $\mu$ such that $m_k=\int x^k d\mu(x)$.
The answer is not always; a simple condition we need to satisfy is $m_{2k}$ needs to be larger or equal to $0$. More generally, for all polynomial $p\in\RR[x]$, we need to have $\int p(x)^2 d\mu(x)\geq 0$, which may be expanded to a linear inequality condition on the $m_k$.
In fact, these are the only necessary conditions.
\begin{theorem}
Suppose for all $k$ we have $$M := \begin{bmatrix}
m_0=1 & m_1 & m_2 & \cdots & m_k \\
m_1 & m_2 & m_3 & \cdots & \cdots \\
m_2 & m_3 & \cdots & \cdots & \cdots \\
\cdots & \cdots & \cdots & \cdots & m_{2k-1} \\
m_k & \cdots & \cdots & m_{2k-1} & m_{2k}
\end{bmatrix}\succeq 0,$$ i.e., $v^TMv\geq 0$ for all $v$. One can check this condition is equivalent to $$\int (v_0+v_1 x +\cdots v_kx_k)^2 d\mu(x)\geq 0.$$ Then there exists a $\mu$ such that $m_k=\int x^k d\mu(x)$.
\end{theorem}
\section{Convergence of moments}
Let's first recall our second question: suppose we have sequences of moments converge to some number $$\lim_{n\to \infty} \EE X_n^k \to m_k = \int x^k d \mu(x)$$ for all $k$. We want to know if this implies random variable $X_n$ converge in distribution to some law $\mu$, i.e., $$\PP[X_n\leq t] \to \mu((-\infty, t])$$
We will answer this question in two steps. Let $\mu_n$ be the law of $X_n$. First, let's consider the simplest case in which the implication holds:
\begin{theorem}
Suppose there exists an $M > 0$ such that
\begin{itemize}
\item $X_n\in [-M, M]$ with probability $1$
\item $\operatorname{supp}(\mu)\subseteq [-M, M]$
\end{itemize}
Then we have $$\PP[X_n\leq t] = \int_{-M}^M \mathbb{1}_{[-M, t]} (x)d\mu_n(x) \to \int_{-M}^M \mathbb{1}_{[-M, t]}(x)d\mu(x)$$
\end{theorem}
\begin{proof}
Let's denote $\mu_n$ be the law of $X_n$, then the assumption is equivalent to $$\int x^k d\mu_n(x)\to \int x^k d\mu(x)$$ Then we want to show $$\PP[X_n\leq t] = \int_{-M}^M \mathbb{1}_{[-M, t]} (x)d\mu_n(x) \to \int_{-M}^M \mathbb{1}_{[-M, t]}(x)d\mu(x)$$
We want to find a polynomial $p$ that approximates $\mathbb{1}_{[-M, t]}(x)$ to additive error $\varepsilon$ away from $t$, additive error for instance let's say $2$ near $t$.
\begin{center}
\includegraphics[width = 0.7\textwidth]{2023-02-07-fig1.jpg}
\end{center}
Then we could check that
\begin{align*}
\left|\int_{-M}^M \mathbb{1}_{[-M, t]} (x)d\mu_n(x) - \int_{-M}^M \mathbb{1}_{[-M, t]}(x)d\mu(x) \right| \leq &\left| \int p(x)d\mu(x)
- \int p(x)d\mu(x)\right| + 4\varepsilon M \\&+ 2(\mu_n([t-\delta, t+\delta]) + \mu([t-\delta, t+\delta]))
\end{align*}
We could choose $\varepsilon,\delta$ small, and choose $p$, allowing us to make all terms arbitrarily small.
\end{proof}
More generally, we can use the following theorem.
\begin{theorem}[Levy]
A random variable $X_n$ converge to distribution $\mu$ if and only if the characteristic function $\Phi_{X_n}(t)$ converge to the characteristic function $\Phi_{\mu}(t)$ for all $t\in\RR$.
\end{theorem}
\begin{corollary}
Under the same growth condition as before, i.e., $|m_k|\leq CM^kk!$, if we have $$\EE X_n^k \to \int x^k d\mu(x) = m_k,$$ then $X_n$ converge in distribution to $\mu$.
\end{corollary}
\begin{example}[Central limit theorem] Let $\mu = \sN(0,1)$, then we could see $m_k = 0$ when $k$ is odd, $m_k = (k-1)!!$ when $k$ is even. With this, we can see $m_k \leq (k-1)!!\leq k!$ satisfies the growth condition.
Let's define $$A_1,\dots,A_n\overset{\text{iid}}{\sim} \operatorname{Unif}(\{\pm 1\}), \quad X_n = \frac{A_1+\cdots + A_n}{\sqrt{n}}$$
Let's assume $k$ is even and look at
\begin{align*}
\EE X_n^{k} &= n^{-k/2}\sum_{i_1,\dots,i_k} \EE[A_{i_1} \cdots A_{i_{k}}] \\
&= n^{-k/2}\#\{(i_1,\dots,i_k)\in[n]^k : \text{each index happens even number of times}\} \\
&\approx n^{-k/2} (n)\cdot (n-1)\cdot\cdots\cdot(n-k/2+1) \#\{\text{pairing of } 1,\dots,k\}
\end{align*}
Since $n$ goes to infinity and $k$ is constant, we could say $n^{-k/2} (n)\cdot (n-1)\cdot\cdots\cdot(n-k/2+1)$ cancel each other approximately.
So we have $$\EE X_n^k \to \#\{\text{pairing of } 1,\dots,k\} = \mathbb{1}\{k \text{ even}\}(k-1)!!.$$
\end{example}
With some more careful argument, we could prove the central limit theorem for more general distributions of $A_i$ in this way.
\begin{example}
Similar, though more complicated, versions of the same ideas are used to derive the $\operatorname{Exp}(\sqrt{(2\pi)/3})$ limit theorem for spacings in the random number partitionining problem.
\end{example}
\section{Appearance in random matrix theory}
Now we want to talk about the relevance to random matrices. Let's fix some notation here; in the following, $A_n \in \RR^{n\times n}_{\mathrm{sym}}$ are symmetric random matrices. Since they are symmetric, they have real eigenvalues $\lambda_1,\dots,\lambda_n\in\RR$.
We define $\mu_n$ the so-called \emph{empirical spectral distribution} of $A_n$ as follows:
$$\mu_n = \frac{1}{n}\sum_{i=1}^n \delta_{\lambda_i}.$$
An important thing we need to keep in mind is these are \emph{random} probability measures.
We want to ask in what sense can we have $\mu_n$ converges to a $\mu$, which is a deterministic probability measure?
These random probability measures are pretty complicated, but we have many tools to discuss the convergence of scalar random variables. So, let's build random variables from $\mu_n$,
$$\mu_n([a,b]) = \frac{\#\{i \in [n] : \lambda_i \in [a,b]\}}{n}.$$ We can then ask if this converges in some standard probability sense to $\mu([a,b])$, for all choices of $a, b$.
Notice that $\int x^k d\mu_n(x)$ is a random variable instead of a number.
We will have convenient tools to calculate the expectation of this random variable:
$$\Ex_{A_n}\int x^kd\mu_n(x) = \Ex_{A_n}\frac{1}{n}\sum_{i=1}^n \lambda_i^k = \frac{1}{n}\Ex_{A_n} \operatorname{Tr}(A_n^k).$$
Now we want to ask if this converges to $\int x^k d\mu(x)$.
This is a necessary condition to have the above kind of convergence, but generally is not enough.
This technique will only tell us the expectation $\EE \mu_n([a,b])$ converge to $\mu([a,b])$, but what we want is convergence in probability, which asks if can we get the probability that fraction differs from the right-hand side by any $\varepsilon$ goes to zero. We can't get this by only using the expectations of moments. One additional piece of information that suffices is enough control over $\operatorname{Var}(\frac{1}{n}\operatorname{Tr}(A_n^k))$. With additional such conditions, we could prove the convergence in probability by using Chebyshev inequality.
\end{document}