Few words about me:
Zhigang Yao is an Associate Professor in the Department of Statistics and
Data Science at the National University of
Singapore (NUS). He also holds a courtesy joint
appointment with the Department of Mathematics at
NUS. He is a Faculty Affiliate of the Institute of Data Science (IDS)
at NUS. He received his
Ph.D. in Statistics from University of Pittsburgh in 2011. His thesis
advisors are Bill Eddy at Carnegie Mellon and Leon Gleser at
University of Pittsburgh. He has been an Assistant Professor at
NUS from 20142020. Before joining NUS, he has been working with Victor Panaretos
as a postdoc researcher at the Swiss Federal Institute of Technology (EPFL)
from 20112014.
From
2022, he has been a member of the Center
of Mathematical Sciences and Applications (CMSA)
at Harvard University. At Harvard, he collaborates with ShingTung Yau on manifold fitting and
research the interface between statistics and geometry. He is also a Visiting Faculty
in the Department of Statistics
at Harvard. He is the organizer of the Harvard Conference on
Geometry and Statistics, hosted by CMSA in 2023.
Few words about my work:
My
main research area is statistical inference for complex data. The work that
I have primarily focused on involves an interaction between statistics
and geometry.
Linearity
has been viewed as a fundamental cornerstone in the development of
statistical methodology. For decades, prominent progress in statistics
has been made, with the focus on linearizing the data and the way we analyze them. In modern times, we have often
encountered various kinds of highthroughput data that share a
highdimensional characteristic. Although each data point usually
represents itself as a long vector or a big matrix, in principle they all
can be viewed as points on or near an intrinsic manifold.
How
to uncover the underlying structure, a lowerdimensional manifold,
beneath the highdimensional data is of great interest to me. This could
be termed as manifold learning, although the term as commonly used means
something different. My work can be summarized as pursuing the
methodological/theoretic development as well as the applied side with
regard to determining the lowerdimensional structure from complex data.
This includes two categories of work that I have been conducting: 1)
Learning structure from data, via Euclidean and nonEuclidean methods, 2)
Applying developed methods in real data science.
At
a very high level, my Euclidean work in the first category is about
classification in highdimensional data where the useful signal is rare
and weak, a challenging region in classical statistical research. When
data space is no longer linear (i.e., curved), my nonEuclidean work
includes making statistical inference by estimating a lowerdimensional
manifold/submanifold from the data, with bounds to the actual underlying
manifold; if the above problem can be partially solved, we can utilize
the estimated structure (submanifolds) to make an inference, such as
classifying or clustering. My earlier work such as finding the principal
variation (principal flows or principal boundary) of data lying on
manifolds, is well connected to this. For both scenarios, most
conventional approaches fail.
My
work in the second category consists of lengthy research into the inverse
problem from brain imaging (i.e., Magnetoencephalography (MEG)) and
tomographic reconstruction (i.e., the electron microscope). In MEG, the
problem is to localize the electrical source in the brain using the
extremely weak magnetic signal outside of the head; in tomography, the
problem is to obtain the complete 3D folding of the particle from the
partial information (its 2D projections) recorded on the film. In the
highdimensional setting, the useful signal is rare and weak and, as a
result, there is currently no unique solution for these problems.
