Two-bit networks for deep learning on resource-constrained embedded devices W Meng, Z Gu, M Zhang, Z Wu arXiv preprint arXiv:1701.00485, 2017 | 42 | 2017 |
An off-policy trust region policy optimization method with monotonic improvement guarantee for deep reinforcement learning W Meng, Q Zheng, Y Shi, G Pan IEEE Transactions on Neural Networks and Learning Systems 33 (5), 2223-2235, 2021 | 37 | 2021 |
Qualitative measurements of policy discrepancy for return-based deep q-network W Meng, Q Zheng, L Yang, P Li, G Pan IEEE transactions on neural networks and learning systems 31 (10), 4374-4380, 2019 | 29 | 2019 |
A unified approach for multi-step temporal-difference learning with eligibility traces in reinforcement learning L Yang, M Shi, Q Zheng, W Meng, G Pan arXiv preprint arXiv:1802.03171, 2018 | 22 | 2018 |
Off-policy proximal policy optimization W Meng, Q Zheng, G Pan, Y Yin Proceedings of the AAAI Conference on Artificial Intelligence 37 (8), 9162-9170, 2023 | 1 | 2023 |
Off-OAB: Off-Policy Policy Gradient Method with Optimal Action-Dependent Baseline W Meng, Q Zheng, L Yang, Y Yin, G Pan arXiv preprint arXiv:2405.02572, 2024 | | 2024 |