Want to understand feature learning and the final network weights?
[TO BE WRITTEN]
this one’s hard. it’s the subject of a lot of speculation, and a lot of people want to know. it’s been the subject of a lot of mechinterp, but it’s proven quite hard to get an analytically.
- deep linear nets
- single- / multi-index model theory
- RFMs
- complicated calculational tools at large width: DMFT, TP, etc.
Comments