Few words about me:
Zhigang Yao is an Associate Professor in the Department of Statistics and
Data Science at the National University of
Singapore (NUS). He also holds a courtesy joint
appointment with the Department of Mathematics at
NUS. He is a Faculty Affiliate of the Institute of Data Science (IDS)
at NUS. He received his
Ph.D. in Statistics from University of Pittsburgh in 2011. His thesis
advisors are Bill Eddy at Carnegie Mellon and Leon Gleser at
University of Pittsburgh. He has been an Assistant Professor at
NUS from 2014-2020. Before joining NUS, he has been working with Victor Panaretos
as a post-doc researcher at the Swiss Federal Institute of Technology (EPFL)
from 2011-2014.
From
2022, he has been a member of the Center
of Mathematical Sciences and Applications (CMSA)
at Harvard University. At Harvard, he collaborates with Shing-Tung Yau on manifold fitting and
researches the interface between statistics and geometry. He proactively
promotes emerging research directions at the intersection of statistics
and geometry on an international scale. Notably, he initiated the first Harvard Conference on
Geometry and Statistics in 2023. He also
co-organized two Interaction of Statistics and
Geometry (ISAG) conferences in Singapore, hosted by the Institute of
Mathematical Sciences (IMS). Additionally, he initiated two symposiums in
China at the Beijing Institute of
Mathematical Sciences and Applications (BIMSA)
and the Shanghai Institute for
Mathematics and Interdisciplinary Sciences (SIMIS),
scheduled for 2023-2025.
Few words about my work:
My
primary research focuses on statistical inference for complex data, with
an emphasis on the interaction between statistics and geometry.
Linearity
has long been regarded as a fundamental cornerstone in the development of
statistical methodology. For decades, significant progress in statistics
has focused on linearizing data and refining the methods we use to
analyze it. In recent times, however, we have increasingly encountered
various types of high-throughput data with high-dimensional
characteristics. While each data point is often represented as a long
vector or a large matrix, in principle, they can all be viewed as points
lying on or near an intrinsic manifold.
Uncovering
the underlying structure — a lower-dimensional manifold — beneath
high-dimensional data is a key area of interest for me. I coin the term
manifold fitting to describe this process, although the term manifold
learning is commonly used in a different context. My work can be
summarized as both advancing the theoretical and methodological
development and applying these techniques to determine the
lower-dimensional structure within complex data. This research spans two
main areas: 1) Learning structure from data using both Euclidean and
non-Euclidean methods, and 2) Applying the developed methods to
real-world data science problems.
At
a high level, my Euclidean work in the first category focuses on
classification in high-dimensional data, where the useful signal is rare
and weak — a challenging area in classical statistical research. When the
data space becomes non-linear (i.e., curved), my non-Euclidean work
involves making statistical inferences by estimating a lower-dimensional
manifold or sub-manifold from the data, with bounds on the actual
underlying manifold. If this problem can be partially solved, we can use
the estimated structure (sub-manifolds) for further inference tasks, such
as classification or clustering. My earlier work, such as identifying the
principal variation (principal flows or principal boundaries) of data
lying on manifolds, is closely connected to this approach. In both
scenarios, conventional methods typically fail.
My
work in the second category involves extensive research into inverse
problems in brain imaging (e.g., Magnetoencephalography, or MEG) and
tomographic reconstruction (e.g., electron microscopy). In MEG, the
challenge is to localize the electrical sources in the brain using the
extremely weak magnetic signals detected outside the head. In tomography,
the problem is to reconstruct the complete 3D shape of a particle from
its partial 2D projections recorded on film. In both cases, the useful
signal is rare and weak in the high-dimensional setting, and as a result,
there is currently no unique solution to these problems.
|