Dr. Rina Dechter - University of California at Irvine ZOT!
home | publications | book | courses | research Revised on Nov. 14, 2019


CompSci 295 Reinforcement Learning, Winter 2018


  • Classroom: DBH 1429
  • Day: Monday
  • Time: 4:00 - 6:30 pm
  • Instructor: Rina Dechter - dechter@ics.uci.edu

The class will cover topics in Reinforcement Learning and in Planning Under Uncertainty. The class will run as a seminar. I will give the first few introductory classes. Then students will be required to read and present papers from the literature or chapters in books to the class and do a project which can be based on their selected papers.There may also be some home-works assigned. The class is intended for PhD students in the area of AI and Machine Learning, with 271 and 273 courses as prerequisite. If you are a second year master student that already took 271 and 273 you should talk to me to get an approval.

Project Spreadsheet

Relevant sources (books or classes):

Papers:

  • Learning to Predict by the Methods of Temporal Differences [pdf]
    Richard S. Sutton
    Machine Learning, volume 3, pp 9-44, 1988.

  • An Upper Bound on the Loss from Approximate Optimal-Value Functions [pdf]
    Satinder P. Singh and Richard C. Yee
    Machine Learning, volume 16, pp 227-233, 1994.

  • Algorithms for Sequential Decision Making [pdf]
    Michael L. Littman
    Ph.D. Dissertation, Brown University, Providence, RI, USA, March 1996.

  • Reinforcement Learning: A Survey [pdf]
    Leslie Pack Kaelbling, Michael L. Littman and Andrew W. Moore
    Journal of Artificial Intelligence Research, volume 4, pp 237-285, 1996.

  • Decision-Theoretic Planning: Structural Assumptions and Computational Leverage [pdf]
    Craig Boutilier, Thomas Dean and Steve Hanks
    Journal of Artificial Intelligence Research, volume 11, pp 1-94, 1999.

  • SPUDD: Stochastic Planning using Decision Diagrams [pdf]
    Jesse Hoey, Robert St-Aubin, Alan Hu and Craig Boutilier
    UAI-99. 15th Conference on Uncertainty in Artificial Intelligence, Stockholm, Sweden, July 1999.

  • Policy gradient methods for reinforcement learning with function approximation [pdf]
    Richard S. Sutton, David McAllester, Satinder Singh and Yishay Mansour
    NIPS-99. 12th International Conference on Neural Information Processing Systems, Denver, Colorado, USA, December 1999.

  • Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms [pdf]
    Satinder Singh, Tommi Jaakkola, Michael L. Littman and Csaba Szepesvári
    Machine Learning, volume 39, pp 287–308, 2000.

  • Near-Optimal Reinforcement Learning in Polynomial Time [pdf]
    Michael Kearns and Satinder Singh
    Machine Learning, volume 49, pp 209-232, 2002.

  • R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning [pdf]
    Ronen I. Brafman and Moshe Tennenholtz
    Journal of Machine Learning Research, volume 3, pp 213-231, 2002.

  • Equivalence notions and model minimization in Markov decision processes [pdf]
    Robert Givan, Thomas Dean and Matthew Greig
    Artificial Intelligence, volume 147, pp 163-223, 2003.

  • Least-Squares Policy Iteration [pdf]
    Michail G. Lagoudakis and Ronald Parr
    Journal of Machine Learning Research, volume 4, pp 1107-1149, 2003.

  • Tree-Based Batch Mode Reinforcement Learning [pdf]
    Damien Ernst, Pierre Geurts and Louis Wehenkel
    Journal of Machine Learning Research, volume 6, pp 503-556, 2005.

  • An Analytic Solution to Discrete Bayesian Reinforcement Learning [pdf]
    Pascal Poupart, Nikos Vlassis, Jesse Hoey, Kevin Regan
    ICML-06. 23rd International Conference on Machine Learning, Pittsburgh, PA, USA, June 2006.

  • Bandit based monte-carlo planning [pdf]
    Levente Kocsis, Csaba Szepesvári
    ECML-06. 17th European Conference on Machine Learning, Berlin, Germany, September 2006.

  • Knows What It Knows: A Framework For Self-Aware Learning [pdf]
    Lihong Li, Michael L. Littman, Thomas J. Walsh
    ICML-08. 25th International Conference on Machine Learning, Helsinki, Finland, July 2008.

  • An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning [pdf]
    Ronald Parr, Lihong Li, Gavin Taylor, Christopher Painter-Wakefield, Michael L. Littman
    ICML-08. 25th International Conference on Machine Learning, Helsinki, Finland, July 2008.

  • An analysis of model-based Interval Estimation for Markov Decision Processes [pdf]
    Alexander L.Strehl and Michael L.Littman
    Journal of Computer and System Sciences, volume 74, pp 1309-1331, 2008.

  • A Bayesian sampling approach to exploration in reinforcement learning [pdf]
    John Asmuth, Lihong Li, Michael L. Littman, Ali Nouri, David Wingate
    UAI-09. 25th Conference on Uncertainty in Artificial Intelligence, Montreal, Quebec, Canada, June 2009.

  • Fast gradient-descent methods for temporal-difference learning with linear function approximation [pdf]
    Richard S. Sutton, Hamid Reza Maei, Doina Precup, Shalabh Bhatnagar, David Silver, Csaba Szepesvári, Eric Wiewiora
    ICML-09. 26th International Conference on Machine Learning, Montreal, Quebec, Canada, June 2009.

  • Reinforcement Learning and Simulation-Based Search in Computer Go [pdf]
    David Silver
    Ph.D. Dissertation, University of Alberta, Edmonton, Alberta, Canada, 2009.

  • Transfer Learning for Reinforcement Learning Domains: A Survey [pdf]
    Matthew E. Taylor and Peter Stone
    Journal of Machine Learning Research, volume 10, pp 1633-1685, 2009.

  • Toward Off-Policy Learning Control with Function Approximation [pdf]
    Hamid Reza Maei, Csaba Szepesvári, Shalabh Bhatnagar, Richard S. Sutton
    ICML-10. 27th International Conference on Machine Learning, Haifa, Israel, June 2010.

  • Monte Carlo tree search in Kriegspiel [pdf]
    Paolo Ciancarini and Gian Piero Favini
    Artificial Intelligence, volume 174, pp 670-684, 2010.

  • Monte-Carlo tree search and rapid action value estimation in computer Go [pdf]
    Sylvain Gelly and David Silver
    Artificial Intelligence, volume 175, pp 1856-1875, 2011.

  • Greedy Algorithms for Sparse Reinforcement Learning [pdf]
    Christopher Painter-Wakefield, Ronald Parr
    ICML-12. 29th International Conference on Machine Learning, Edinburgh, Scotland, UK, July 2012.

  • A Survey of Monte Carlo Tree Search Methods [pdf]
    Cameron Browne, Edward Powley, Daniel Whitehouse, Simon Lucas, Peter I. Cowling, Philipp Rohlfshagen, Stephen Tavener, Diego Perez, Spyridon Samothrakis and Simon Colton
    IEEE Transactions on Computational Intelligence and AI in Games, volume 4, pp 1-43, 2012.

  • Batch-iFDD for representation expansion in large MDPs [pdf]
    Alborz Geramifard, Thomas J. Walsh, Nicholas Roy, Jonathan P. How
    UAI-13. 29th Conference on Uncertainty in Artificial Intelligence, Bellevue, Washington, USA, August 2013.

  • Offline policy evaluation across representations with applications to educational games [pdf]
    Travis Mandel, Yun-En Liu, Sergey Levine, Emma Brunskill, Zoran Popovic
    AAMAS-14. 2014 International Conference on Autonomous Agents and Multi-agent Systems, Paris, France, May 2014.

  • High-Confidence Off-Policy Evaluation [pdf]
    Philip S. Thomas, Georgios Theocharous, Mohammad Ghavamzadeh
    AAAI-15. 29th AAAI Conference on Artificial Intelligence, Austin, Texas, USA, January 2015.

  • Policy evaluation using the Ω-return [pdf]
    Philip S. Thomas, Scott Niekum, Georgios Theocharous, George Konidaris
    NIPS-15. 28th International Conference on Neural Information Processing Systems, Montreal, Canada, December 2015.

  • Mastering the game of Go without human knowledge [pdf]
    David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, Yutian Chen, Timothy Lillicrap, Fan Hui, Laurent Sifre, George van den Driessche, Thore Graepel and Demis Hassabis
    Nature, volume 550, pp 354–359, 2017.

NIPS 2017 Papers:

  • Optimistic posterior sampling for reinforcement learning: worst-case regret bounds [pdf]
    Shipra Agrawal, Randy Jia
    NIPS-17. 30th Annual Conference on Neural Information Processing Systems, Long Beach, California, USA, December 2017.

  • Regret Analysis for Continuous Dueling Bandit [pdf]
    Wataru Kumagai
    NIPS-17. 30th Annual Conference on Neural Information Processing Systems, Long Beach, California, USA, December 2017.

  • Minimal Exploration in Structured Stochastic Bandits [pdf]
    Richard Combes, Stefan Magureanu, Alexandre Proutiere
    NIPS-17. 30th Annual Conference on Neural Information Processing Systems, Long Beach, California, USA, December 2017.

  • Shallow Updates for Deep Reinforcement Learning [pdf]
    Nir Levine, Tom Zahavy, Daniel J. Mankowitz, Aviv Tamar
    NIPS-17. 30th Annual Conference on Neural Information Processing Systems, Long Beach, California, USA, December 2017.

  • Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning [pdf]
    Shixiang Gu, Timothy Lillicrap, Zoubin Ghahramani, Richard E. Turner, Bernhard Schölkopf, Sergey Levine
    NIPS-17. 30th Annual Conference on Neural Information Processing Systems, Long Beach, California, USA, December 2017.

  • Monte-Carlo Tree Search by Best Arm Identification [pdf]
    Emilie Kaufmann, Wouter M. Koolen
    NIPS-17. 30th Annual Conference on Neural Information Processing Systems, Long Beach, California, USA, December 2017.

  • Hybrid Reward Architecture for Reinforcement Learning [pdf]
    Harm van Seijen, Mehdi Fatemi, Joshua Romoff, Romain Laroche, Tavian Barnes, Jeffrey Tsang
    NIPS-17. 30th Annual Conference on Neural Information Processing Systems, Long Beach, California, USA, December 2017.

  • Robust and Efficient Transfer Learning with Hidden Parameter Markov Decision Processes [pdf]
    Taylor Killian, Samuel Daulton, George Konidaris, Finale Doshi-Velez
    NIPS-17. 30th Annual Conference on Neural Information Processing Systems, Long Beach, California, USA, December 2017.

  • Towards Generalization and Simplicity in Continuous Control [pdf]
    Aravind Rajeswaran, Kendall Lowrey, Emanuel Todorov, Sham Kakade
    NIPS-17. 30th Annual Conference on Neural Information Processing Systems, Long Beach, California, USA, December 2017.

  • Inverse Reward Design [pdf]
    Dylan Hadfield-Menell, Smitha Milli, Pieter Abbeel, Stuart Russell, Anca Dragan
    NIPS-17. 30th Annual Conference on Neural Information Processing Systems, Long Beach, California, USA, December 2017.

  • Learning Combinatorial Optimization Algorithms over Graphs [pdf]
    Hanjun Dai, Elias B. Khalil, Yuyu Zhang, Bistra Dilkina, Le Song
    NIPS-17. 30th Annual Conference on Neural Information Processing Systems, Long Beach, California, USA, December 2017.

Reinforcement Learning Symposium (NIPS 2017) Papers:

  • Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning [pdf]
    Anusha Nagabandi, Gregory Kahn, Ronald S. Fearing, Sergey Levine
    DRLS-17. Deep Reinforcement Learning Symposium, NIPS 2017, Long Beach, California, USA, December 2017.

  • Parameter Space Noise for Exploration [pdf]
    Matthias Plappertyz, Rein Houthoofty, Prafulla Dhariwaly, Szymon Sidory, Richard Y. Cheny, Xi Chen, Tamim Asfourz, Pieter Abbeel, Marcin Andrychowiczy
    DRLS-17. Deep Reinforcement Learning Symposium, NIPS 2017, Long Beach, California, USA, December 2017.

  • Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor [pdf]
    Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, Sergey Levine
    DRLS-17. Deep Reinforcement Learning Symposium, NIPS 2017, Long Beach, California, USA, December 2017.

  • Time-Contrastive Networks: Self-Supervised Learning from Pixels [pdf]
    Pierre Sermanet, Corey Lynch, Yevgen Chebotar, Jasmine Hsu, Eric Jang, Stefan Schaal, Sergey Levine
    DRLS-17. Deep Reinforcement Learning Symposium, NIPS 2017, Long Beach, California, USA, December 2017.

Conferences, Symposia, Workshops:

  • DRLS-17. Deep Reinforcement Learning Symposium, NIPS 2017, Long Beach, USA, December 2017.

  • NIPS-17. Advances in Neural Information Processing Systems, NIPS 2017, Long Beach, USA, December 2017.

  • DRLW-16. Deep Reinforcement Learning Workshop, NIPS 2016, Barcelona, Spain, December 2016.

  • EWRL-16. The 13th European Workshop on Reinforcement Learning, Barcelona, Spain, December 2016.

  • DRLW-15. Deep Reinforcement Learning Workshop, NIPS 2015, Montreal, Canada, December 2015.

Schedule

Week           Date Topic Readings and Files
Week 1 1/8
Homework 1
Slides 1
Week 2 1/15 Martin Luther King, Jr. Day - No class
Week 3 1/22
Homework 2
Slides 2
Week 4
1/29
Homework 3
Slides 3
Week 5 2/5 No class
Week 6 2/12
Slides 4
Pezeshki Slides | Paper
Broka Slides | Paper
Zou Slides | Paper
Week 7 2/19 Presidents' Day - No class Homework 4
Week 8 2/26
Homework 5
Xu Slides | Paper
Praveen Slides | Paper
Pandey Slides | Paper
Week 9
3/5
Homework 6
Blog Post
McAleer Slides| Paper
Dheeru Slides | Paper
Lanier & Takashi Slides | Paper
Week 10
3/12
Homework 7
Logan Slides| Paper
LaCroix Slides | Paper
Lee Slides | Paper
Week 11
3/19 4-6:30p
Lecture to be held in the usual classroom. Nelson Slides| Paper
Chen Slides | Paper
Moskvichev Slides | Paper