Contact +:

 

Department of Statistics and Data Science
National University of Singapore
21 Lower Kent Ridge Road             
Singapore 117546
*:zhigang.yao@nus.edu.sg  

Center of Mathematical Sciences and Applications
Harvard University
20 Garden Street              
Cambridge MA 02138
*:zhigang.yao@cmsa.fas.harvard.edu                         



 

Few words about me:


Zhigang Yao
is an Associate Professor in the Department of Statistics and Data Science at the National University of Singapore (NUS). He also holds a courtesy joint appointment with the Department of Mathematics at NUS. He is a Faculty Affiliate of the Institute of Data Science (IDS) at NUS. He received his Ph.D. in Statistics from University of Pittsburgh in 2011. His thesis advisors are Bill Eddy at Carnegie Mellon and Leon Gleser at University of Pittsburgh. He has been an Assistant Professor at NUS from 2014-2020. Before joining NUS, he has been working with Victor Panaretos as a post-doc researcher at the Swiss Federal Institute of Technology (EPFL) from 2011-2014.

 

From 2022, he has been a member of the Center of Mathematical Sciences and Applications (CMSA) at Harvard University.  At Harvard, he collaborates with Shing-Tung Yau on manifold fitting and research the interface between statistics and geometry. He is also a Visiting Faculty in the Department of Statistics at Harvard. He is the organizer of the Harvard Conference on Geometry and Statistics, hosted by CMSA in 2023.

 

 

Few words about my work:

My main research area is statistical inference for complex data. The work that I have primarily focused on involves an interaction between statistics and geometry.

Linearity has been viewed as a fundamental cornerstone in the development of statistical methodology. For decades, prominent progress in statistics has been made, with the focus on linearizing the data and the way we analyze them. In modern times, we have often encountered various kinds of high-throughput data that share a high-dimensional characteristic. Although each data point usually represents itself as a long vector or a big matrix, in principle they all can be viewed as points on or near an intrinsic manifold.

How to uncover the underlying structure, a lower-dimensional manifold, beneath the high-dimensional data is of great interest to me. This could be termed as manifold learning, although the term as commonly used means something different. My work can be summarized as pursuing the methodological/theoretic development as well as the applied side with regard to determining the lower-dimensional structure from complex data. This includes two categories of work that I have been conducting: 1) Learning structure from data, via Euclidean and non-Euclidean methods, 2) Applying developed methods in real data science.

At a very high level, my Euclidean work in the first category is about classification in high-dimensional data where the useful signal is rare and weak, a challenging region in classical statistical research. When data space is no longer linear (i.e., curved), my non-Euclidean work includes making statistical inference by estimating a lower-dimensional manifold/sub-manifold from the data, with bounds to the actual underlying manifold; if the above problem can be partially solved, we can utilize the estimated structure (sub-manifolds) to make an inference, such as classifying or clustering. My earlier work such as finding the principal variation (principal flows or principal boundary) of data lying on manifolds, is well connected to this. For both scenarios, most conventional approaches fail.

 

My work in the second category consists of lengthy research into the inverse problem from brain imaging (i.e., Magnetoencephalography (MEG)) and tomographic reconstruction (i.e., the electron microscope). In MEG, the problem is to localize the electrical source in the brain using the extremely weak magnetic signal outside of the head; in tomography, the problem is to obtain the complete 3D folding of the particle from the partial information (its 2D projections) recorded on the film. In the high-dimensional setting, the useful signal is rare and weak and, as a result, there is currently no unique solution for these problems.