Sgd with large step sizes learns sparse features M Andriushchenko, AV Varre, L Pillaud-Vivien, N Flammarion International Conference on Machine Learning, 903-925, 2023 | 42 | 2023 |
Last iterate convergence of SGD for Least-Squares in the Interpolation regime. AV Varre, L Pillaud-Vivien, N Flammarion Advances in Neural Information Processing Systems 34, 21581-21591, 2021 | 32 | 2021 |
Variants of homomorphism polynomials complete for algebraic complexity classes P Chaugule, N Limaye, A Varre ACM Transactions on Computation Theory (TOCT) 13 (4), 1-26, 2021 | 8 | 2021 |
Why Do We Need Weight Decay in Modern Deep Learning? M Andriushchenko, F D'Angelo, A Varre, N Flammarion arXiv preprint arXiv:2310.04415, 2023 | 7 | 2023 |
Accelerated sgd for non-strongly-convex least squares A Varre, N Flammarion Conference on Learning Theory, 2062-2126, 2022 | 5 | 2022 |
On the spectral bias of two-layer linear networks AV Varre, ML Vladarean, L Pillaud-Vivien, N Flammarion Advances in Neural Information Processing Systems 36, 2024 | 2 | 2024 |
Why Do We Need Weight Decay for Overparameterized Deep Networks? F D'Angelo, A Varre, M Andriushchenko, N Flammarion NeurIPS 2023 Workshop on Mathematics of Modern Machine Learning, 2023 | | 2023 |