Follow
Łukasz Kaiser
Łukasz Kaiser
OpenAI & CNRS
Verified email at openai.com - Homepage
Title
Cited by
Cited by
Year
Attention is all you need
A Vaswani, N Shazeer, N Parmar, J Uszkoreit, L Jones, AN Gomez, ...
Advances in neural information processing systems 30, 2017
1166982017
TensorFlow: Large-scale machine learning on heterogeneous systems
M Abadi, A Agarwal, P Barham, E Brevdo, Z Chen, C Citro, GS Corrado, ...
24170*2015
Google's neural machine translation system: Bridging the gap between human and machine translation
Y Wu, M Schuster, Z Chen, QV Le, M Norouzi, W Macherey, M Krikun, ...
arXiv preprint arXiv:1609.08144, 2016
83592016
Attention is all you need
A Waswani, N Shazeer, N Parmar, J Uszkoreit, L Jones, A Gomez, ...
NIPS, 2017
3544*2017
Reformer: The efficient transformer
N Kitaev, Ł Kaiser, A Levskaya
arXiv preprint arXiv:2001.04451, 2020
22252020
Evaluating large language models trained on code
M Chen, J Tworek, H Jun, Q Yuan, HPO Pinto, J Kaplan, H Edwards, ...
arXiv preprint arXiv:2107.03374, 2021
19352021
Image transformer
N Parmar, A Vaswani, J Uszkoreit, L Kaiser, N Shazeer, A Ku, D Tran
International conference on machine learning, 4055-4064, 2018
17972018
Rethinking attention with performers
K Choromanski, V Likhosherstov, D Dohan, X Song, A Gane, T Sarlos, ...
arXiv preprint arXiv:2009.14794, 2020
12622020
Regularizing neural networks by penalizing confident output distributions
G Pereyra, G Tucker, J Chorowski, Ł Kaiser, G Hinton
arXiv preprint arXiv:1701.06548, 2017
11862017
Grammar as a foreign language
O Vinyals, Ł Kaiser, T Koo, S Petrov, I Sutskever, G Hinton
Advances in neural information processing systems 28, 2015
11092015
Training verifiers to solve math word problems
K Cobbe, V Kosaraju, M Bavarian, M Chen, H Jun, L Kaiser, M Plappert, ...
arXiv preprint arXiv:2110.14168, 2021
10502021
Attention is all you need. arXiv 2017
A Vaswani, N Shazeer, N Parmar, J Uszkoreit, L Jones, AN Gomez, ...
arXiv preprint arXiv:1706.03762 3762, 2023
10252023
Multi-task sequence to sequence learning
MT Luong, QV Le, I Sutskever, O Vinyals, L Kaiser
arXiv preprint arXiv:1511.06114, 2015
9262015
Generating wikipedia by summarizing long sequences
PJ Liu, M Saleh, E Pot, B Goodrich, R Sepassi, L Kaiser, N Shazeer
arXiv preprint arXiv:1801.10198, 2018
9062018
Universal transformers
M Dehghani, S Gouws, O Vinyals, J Uszkoreit, Ł Kaiser
arXiv preprint arXiv:1807.03819, 2018
8872018
Model-based reinforcement learning for atari
L Kaiser, M Babaeizadeh, P Milos, B Osinski, RH Campbell, ...
arXiv preprint arXiv:1903.00374, 2019
8742019
TensorFlow: Large-scale machine learning on heterogeneous systems, software available from tensorflow. org (2015)
M Abadi, A Agarwal, P Barham, E Brevdo, Z Chen, C Citro, GS Corrado, ...
URL https://www. tensorflow. org, 2015
8332015
Gpt-4 technical report
J Achiam, S Adler, S Agarwal, L Ahmad, I Akkaya, FL Aleman, D Almeida, ...
arXiv preprint arXiv:2303.08774, 2023
6832023
Tensor2tensor for neural machine translation
A Vaswani, S Bengio, E Brevdo, F Chollet, AN Gomez, S Gouws, L Jones, ...
arXiv preprint arXiv:1803.07416, 2018
6092018
Adding gradient noise improves learning for very deep networks
A Neelakantan, L Vilnis, QV Le, I Sutskever, L Kaiser, K Kurach, J Martens
arXiv preprint arXiv:1511.06807, 2015
5742015
The system can't perform the operation now. Try again later.
Articles 1–20