Codes and Expansions (CodEx) Seminar

Alex Cloninger (University of California at San Diego)
Fast Statistical and Geometric Distances Between Families of Distributions

Detecting differences and building classifiers between a family of distributions, given only finite samples, has had renewed interest due to data science applications in high dimensions. Applicationsinclude survey response effects, topic modeling, and various measurements of cell or gene populations per person. Recent advances have focused on kernel Maximum Mean Discrepancy and Optimal Transport. However, when the family of distributions are concentrated near a low dimensional structure, or when the family of distributions being considered is generated from a family of simple group actions, these algorithms fail to exploit the reduced complexity. In this talk, we'll discuss the theoretical and computational advancements that can be made under these assumptions, and their connections to harmonic analysis, approximation theory, and group actions. Similarly, we'll use both techniques to develop methods of provably identifying not just how much the distributions deviate, but where these differences are concentrated. We'll also focus on applications in medicine, generative modeling, and supervised learning.