Codes and Expansions (CodEx) Seminar


James Murphy (Tufts University):
Intrinsically Low-Dimensional Models for Wasserstein Space: Geometry, Statistics, and Learning

We consider the problems of efficient modeling and representation learning for probability distributions in Wasserstein space. We consider a general barycentric coding model in which data are represented as Wasserstein-2 (W2) barycenters of a set of fixed reference measures. Leveraging the Riemannian structure of W2-space, we develop a tractable optimization program to learn the barycentric coordinates when given access to the densities of the underlying measures. We provide a consistent statistical procedure for learning these coordinates when the measures are accessed only by i.i.d. samples. Our consistency results and algorithms exploit entropic regularization of the optimal transport problem, thereby allowing our barycentric modeling approach to scale efficiently. We also consider the problem of learning reference measures given observed data. Our regularized approach to dictionary learning in Wasserstein space addresses core problems of ill-posedness and in practice learns interpretable dictionary elements and coefficients useful for downstream tasks. Applications to image and natural language processing will be shown throughout the talk.