Contact +:

 

Department of Statistics and Data Science
National University of Singapore
21 Lower Kent Ridge Road             
Singapore 117546
*:zhigang.yao@nus.edu.sg  

Center of Mathematical Sciences and Applications
Harvard University
20 Garden Street              
Cambridge MA 02138
*:zhigang.yao@cmsa.fas.harvard.edu                         



 

Few words about me:


Zhigang Yao
is an Associate Professor in the Department of Statistics and Data Science at the National University of Singapore (NUS). He also holds a courtesy joint appointment with the Department of Mathematics at NUS. He is a Faculty Affiliate of the Institute of Data Science (IDS) at NUS. He received his Ph.D. in Statistics from University of Pittsburgh in 2011. His thesis advisors are Bill Eddy at Carnegie Mellon and Leon Gleser at University of Pittsburgh. He has been an Assistant Professor at NUS from 2014-2020. Before joining NUS, he has been working with Victor Panaretos as a post-doc researcher at the Swiss Federal Institute of Technology (EPFL) from 2011-2014.

 

From 2022, he has been a member of the Center of Mathematical Sciences and Applications (CMSA) at Harvard University.  At Harvard, he collaborates with Shing-Tung Yau on manifold fitting and researches the interface between statistics and geometry. He proactively promotes emerging research directions at the intersection of statistics and geometry on an international scale. Notably, he initiated the first Harvard Conference on Geometry and Statistics in 2023. He also co-organized two Interaction of Statistics and Geometry (ISAG) conferences in Singapore, hosted by the Institute of Mathematical Sciences (IMS). Additionally, he initiated two symposiums in China at the Beijing Institute of Mathematical Sciences and Applications (BIMSA) and the Shanghai Institute for Mathematics and Interdisciplinary Sciences (SIMIS), scheduled for 2023-2025.

 

 

Few words about my work:

My primary research focuses on statistical inference for complex data, with an emphasis on the interaction between statistics and geometry.

Linearity has long been regarded as a fundamental cornerstone in the development of statistical methodology. For decades, significant progress in statistics has focused on linearizing data and refining the methods we use to analyze it. In recent times, however, we have increasingly encountered various types of high-throughput data with high-dimensional characteristics. While each data point is often represented as a long vector or a large matrix, in principle, they can all be viewed as points lying on or near an intrinsic manifold.

Uncovering the underlying structure — a lower-dimensional manifold — beneath high-dimensional data is a key area of interest for me. I coin the term manifold fitting to describe this process, although the term manifold learning is commonly used in a different context. My work can be summarized as both advancing the theoretical and methodological development and applying these techniques to determine the lower-dimensional structure within complex data. This research spans two main areas: 1) Learning structure from data using both Euclidean and non-Euclidean methods, and 2) Applying the developed methods to real-world data science problems.

At a high level, my Euclidean work in the first category focuses on classification in high-dimensional data, where the useful signal is rare and weak — a challenging area in classical statistical research. When the data space becomes non-linear (i.e., curved), my non-Euclidean work involves making statistical inferences by estimating a lower-dimensional manifold or sub-manifold from the data, with bounds on the actual underlying manifold. If this problem can be partially solved, we can use the estimated structure (sub-manifolds) for further inference tasks, such as classification or clustering. My earlier work, such as identifying the principal variation (principal flows or principal boundaries) of data lying on manifolds, is closely connected to this approach. In both scenarios, conventional methods typically fail.

 

My work in the second category involves extensive research into inverse problems in brain imaging (e.g., Magnetoencephalography, or MEG) and tomographic reconstruction (e.g., electron microscopy). In MEG, the challenge is to localize the electrical sources in the brain using the extremely weak magnetic signals detected outside the head. In tomography, the problem is to reconstruct the complete 3D shape of a particle from its partial 2D projections recorded on film. In both cases, the useful signal is rare and weak in the high-dimensional setting, and as a result, there is currently no unique solution to these problems.