Want to understand the convergence and stability of optimization?

Part 4 of A Quickstart Guide to Learning Mechanics (prev | next)

[long list of authors]
2025-09-01

[TO BE WRITTEN]

some facts about DL are clear in practice but really hard to show mathematically!

one of these is: why does optimization work at all?

we can make models
loss surfaces; study of convex loss surfaces
NTK (+ overparam more generally)

okay, what do we know about optimization in the real world?

EoS
central flows?
maybe I can get Jeremy to write this one.

can we do better than GD?

optimizers, Muon, etc.

A Quickstart Guide to Learning Mechanics

Introduction: what do you want to understand?
...the average size of hidden representations?
...hyperparameter selection (and why should theorists care)?
🚧 ...the convergence and stability of optimization?
🚧 ...feature learning and the final network weights?
🚧 ...generalization?
🚧 ...neuron-level sparsity?
🚧 ...the structure in the data?
🚧 Places to make a difference

Comments