Hiroki Naganuma

Overall

Work in Progress

Until Friday, I was staying at Google DeepMind Japan (as work from anywhere) and join end year party of Masason Foundation, so my participation in NeurIPS was limited to the two days of workshops on Saturday and Sunday. Therefore, I would like to focus mainly on these two days in my summary.

Networking + Seeing Friends

Day1

Toyotaro Suzumura / UTokyo
Shinji Ito / RIKEN
Masatoshi Uehara / University of Wisconsin-Madison
Hao Chen / CMU
Bai Cong / Science Tokyo
Ryosuke Yamaki / URitsumeikan
Kotaro Yoshida / Science Tokyo
Sabyasachi Sahoo / ULaval+Mila
Mila
- Reyhane Askari Hemmat Meta
- Mohammad Pezeshki / Meta

Day2

Shikai Qiu / NYU, GDM (work w. Atish)
Jack Min Ong / Prime Intellect (Author of OpenDiLoCo)
- We talked about my work at MSR (which is related to OpenDiLoCo)
Kazuto Fukuchi / UTsukuba
Mila
- Ryan D’Orazio / Mila
- Mehrnaz Mofakhami / Mila
- Charles Guille-Escuret / Mila

FITML Workshop

I will present a work "Mastering Task Arithmetic: τJp as a Key Indicator for Weight Disentanglement" at the #NeurIPS2024 workshop FITML on 14th Dec.

We mitigate the interference between task vectors and coefficient tuning costs through the "τJp" regularization. pic.twitter.com/weaj9Ba75v
— Kotaro Yoshida @NeurIPS2024 (@katoro13___) December 12, 2024

Papers

WIP

OPT Workshop

I will present our work on the efficient distributed training algorithm at the optimization workshop. Join us during our poster sessions from 15:00-16:00.#NeurIPS2024 pic.twitter.com/D4nA8hWX4u
— Hiroki Naganuma (@_Hiroki11x) December 15, 2024

Papers

Spotlight and Keynotes

SOAP: Improving and Stabilizing Shampoo using Adam
- Benchmark
  - CIFAR-5M, ImageNet
- Memo
  - SOAP has better critical batchsize
  - In language modling task, decoupling the eigenvalue and eigenbasis improves the performance
μLO: Compute-Efficient Meta-Generalization of Learned Optimizers
- related work: VeLO: Training Versatile Learned Optimizers by Scaling Up
- github
Catapults in SGD: spikes in the training loss and their impact on generalization through feature learning
- Mikhail (Misha) Belkin
- small batch -> more catapults -> AGOP aignment -> better generalization
- supported by only empirical side

WIP

Acknowlegements

I want to thank Microsoft Research (YVR->SFO) and Masason Foundation (SFO->NRT->YVR) for supporting my participation in the NeurIPS.

Hiroki Naganuma

Overall

Networking + Seeing Friends

Day1

Day2

FITML Workshop

Papers

OPT Workshop

Papers

Spotlight and Keynotes

Topics related to schedulle free, learning rate schedule

Acknowlegements