Statistical Inference for Geometric and Topological Data and Application to Machine Learning
Topic | Topological Data Analysis |
---|---|
Format | Hybird |
Location | DSDSNUSS16 07-107 |
Speaker | Jisu Kim (Seoul National U) |
Time (GMT+8) |
Geometric and topological structures play important roles in statistics and machine learning. Geometric structures help reduce the dimensionality of high-dimensional data, helping to mitigate the curse of dimensionality. Topological structures carry scientific meaning about the data and can be used as features in learning tasks, in particular useful in machine learning. The methods that extract such topological features from data are collectively referred to as Topological Data Analysis (TDA). A major technique in TDA is persistent homology, which observes data at various resolutions and extracts salient topological features across scales.
The first part of this talk focuses on statistical inference for geometric data. First, we examine the minimax risk in estimating the dimension of a manifold. Next, we discuss reach, a parameter that describes how smooth a manifold is and how far it is from self-intersection. We explore the minimax risk in estimating the reach under the manifold assumption.
The second part of the talk addresses statistical inference on TDA. We begin by introducing persistent homology, and discuss how randomness in data can affect the resulting topological features. We present methods to quantify this uncertainty using confidence sets, as well as how to select significant topological features.
The third part of the talk explores how TDA is applied to machine learning. I highlight two primary approaches: (1) featurization, where persistent homology is transformed into Euclidean vectors or functional representations suitable for machine learning pipelines; and (2) evaluation, where TDA is used to assess qualities of data or model. Through case studies, we demonstrate how these methods enhance machine learning methods.