Mnih, Volodymyr, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, et al. “Human-Level Control through Deep Reinforcement Learning.” Nature 518, no. 7540 (February 26, 2015): 529–33. https://doi.org/10.1038/nature14236.
Hasselt, Hado van, Arthur Guez, and David Silver. “Deep Reinforcement Learning with Double Q-Learning.” ArXiv:1509.06461 [Cs], December 8, 2015. http://arxiv.org/abs/1509.06461. DDQN
Schaul, Tom, John Quan, Ioannis Antonoglou, and David Silver. “Prioritized Experience Replay,” February 25, 2016. http://arxiv.org/abs/1511.05952. PDQN
Deep Deterministic Policy Gradient (DDPG)
Lillicrap, Timothy P., Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. “Continuous Control with Deep Reinforcement Learning.” arXiv, July 5, 2019. http://arxiv.org/abs/1509.02971.
Twin Delayed Deep Deterministic Policy Gradient (TD3)
Fujimoto, Scott, Herke van Hoof, and David Meger. “Addressing Function Approximation Error in Actor-Critic Methods.” arXiv, October 22, 2018. http://arxiv.org/abs/1802.09477.
Haarnoja, Tuomas, Haoran Tang, Pieter Abbeel, and Sergey Levine. “Reinforcement Learning with Deep Energy-Based Policies.” arXiv, July 21, 2017. http://arxiv.org/abs/1702.08165.
Haarnoja, Tuomas, Aurick Zhou, Pieter Abbeel, and Sergey Levine. “Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor.” arXiv, August 8, 2018. http://arxiv.org/abs/1801.01290.
Haarnoja, Tuomas, Vitchyr Pong, Aurick Zhou, Murtaza Dalal, Pieter Abbeel, and Sergey Levine. “Composable Deep Reinforcement Learning for Robotic Manipulation.” arXiv, March 18, 2018. http://arxiv.org/abs/1803.06773.
Haarnoja, Tuomas, Aurick Zhou, Kristian Hartikainen, George Tucker, Sehoon Ha, Jie Tan, Vikash Kumar, et al. “Soft Actor-Critic Algorithms and Applications.” arXiv, January 29, 2019. http://arxiv.org/abs/1812.05905.
Fujimoto, Scott, Herke van Hoof, and David Meger. “Addressing Function Approximation Error in Actor-Critic Methods.” arXiv, October 22, 2018. http://arxiv.org/abs/1802.09477.
Schulman, John, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. “Proximal Policy Optimization Algorithms.” arXiv, August 28, 2017. http://arxiv.org/abs/1707.06347.
Engstrom, Logan, Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Firdaus Janoos, Larry Rudolph, and Aleksander Madry. “Implementation Matters in Deep Policy Gradients: A Case Study on PPO and TRPO.” arXiv, May 25, 2020. http://arxiv.org/abs/2005.12729.
Andrychowicz, Marcin, Anton Raichuk, Piotr Stańczyk, Manu Orsini, Sertan Girgin, Raphael Marinier, Léonard Hussenot, et al. “What Matters In On-Policy Reinforcement Learning? A Large-Scale Empirical Study.” arXiv, June 10, 2020. http://arxiv.org/abs/2006.05990.