Want to understand the convergence and stability of optimization?


[TO BE WRITTEN]

some facts about DL are clear in practice but really hard to show mathematically!

one of these is: why does optimization work at all?

okay, what do we know about optimization in the real world?

can we do better than GD?


A Quickstart Guide to Learning Mechanics

  1. Introduction: what do you want to understand?
  2. ...the average size of hidden representations?
  3. ...hyperparameter selection (and why should theorists care)?
  4. 🚧 ...the convergence and stability of optimization?
  5. 🚧 ...feature learning and the final network weights?
  6. 🚧 ...generalization?
  7. 🚧 ...neuron-level sparsity?
  8. 🚧 ...the structure in the data?
  9. 🚧 Places to make a difference

Comments