Codes and Expansions (CodEx) Seminar
Kathlén Kohn (KTH Royal Institute of Technology)
Algebraic neural network theory: On neuromanifolds and how their geometry effects learning
This talk will be very interactive. The audience will explore neuromanifolds and why their geometry governs the learning behavior of neural networks. Since studying their geometry for real-life networks seems out of reach for current mathematics, we approximate arbitrary neural networks by polynomial ones. We will motivate this choice by turning the universal approximation theorem upside down. This then allows us to argue that training polynomial networks can be viewed as solving a closest-point problem on a neuromanifold in a finite dimensional ambient space. We then focus on geometry-induced implicit biases and tradeoffs between efficient optimization and good generalization. Finally, we consider the highly overparametrized setting common in modern large language models where the distance of the closest-point problem is degenerate and present preliminary results on formally comparing the minimization of degenerate distances with non-degenerate distance minimization. This talk is based on several joint works with Paul Breiding, Erin Conelly, Nathan Henry, Giovanni Marchetti, Vahid Shahverdi, Matthew Trager