Dr. Rina Dechter @ UCI

Dr. Rina Dechter - University of California at Irvine
home \| publications \| book \| courses \| research	Revised on Nov. 14, 2019

CompSci 295 Reinforcement Learning, Winter 2018

Classroom: DBH 1429

Day: Monday

Time: 4:00 - 6:30 pm

Instructor: Rina Dechter - dechter@ics.uci.edu

The class will cover topics in Reinforcement Learning and in Planning Under Uncertainty. The class will run as a seminar. I will give the first few introductory classes. Then students will be required to read and present papers from the literature or chapters in books to the class and do a project which can be based on their selected papers.There may also be some home-works assigned. The class is intended for PhD students in the area of AI and Machine Learning, with 271 and 273 courses as prerequisite. If you are a second year master student that already took 271 and 273 you should talk to me to get an approval.

Project Spreadsheet

Relevant sources (books or classes):

Algorithms for Reinforcement Learning
Csaba Szepesvári
Reinforcement Learning: An Introduction
Richard S. Sutton and Andrew G. Barto
Learning and Sequential Decision Making
Michael L. Littman
UCL Course on Reinforcement Learning
David Silver
Planning with Markov Decision Processes: An AI Perspective
Mausam, Andrey Kolobov
A Concise Introduction to Models and Methods for Automated Planning
Hector Geffner and Blai Bonet
Reinforcement Learning: Wikipedia

Papers:

Learning to Predict by the Methods of Temporal Differences [pdf]
Richard S. Sutton
Machine Learning, volume 3, pp 9-44, 1988.

An Upper Bound on the Loss from Approximate Optimal-Value Functions [pdf]
Satinder P. Singh and Richard C. Yee
Machine Learning, volume 16, pp 227-233, 1994.

Algorithms for Sequential Decision Making [pdf]
Michael L. Littman
Ph.D. Dissertation, Brown University, Providence, RI, USA, March 1996.

Reinforcement Learning: A Survey [pdf]
Leslie Pack Kaelbling, Michael L. Littman and Andrew W. Moore
Journal of Artificial Intelligence Research, volume 4, pp 237-285, 1996.

Decision-Theoretic Planning: Structural Assumptions and Computational Leverage [pdf]
Craig Boutilier, Thomas Dean and Steve Hanks
Journal of Artificial Intelligence Research, volume 11, pp 1-94, 1999.

SPUDD: Stochastic Planning using Decision Diagrams [pdf]
Jesse Hoey, Robert St-Aubin, Alan Hu and Craig Boutilier
UAI-99. 15th Conference on Uncertainty in Artificial Intelligence, Stockholm, Sweden, July 1999.

Policy gradient methods for reinforcement learning with function approximation [pdf]
Richard S. Sutton, David McAllester, Satinder Singh and Yishay Mansour
NIPS-99. 12th International Conference on Neural Information Processing Systems, Denver, Colorado, USA, December 1999.

Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms [pdf]
Satinder Singh, Tommi Jaakkola, Michael L. Littman and Csaba Szepesvári
Machine Learning, volume 39, pp 287–308, 2000.

Near-Optimal Reinforcement Learning in Polynomial Time [pdf]
Michael Kearns and Satinder Singh
Machine Learning, volume 49, pp 209-232, 2002.

R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning [pdf]
Ronen I. Brafman and Moshe Tennenholtz
Journal of Machine Learning Research, volume 3, pp 213-231, 2002.

Equivalence notions and model minimization in Markov decision processes [pdf]
Robert Givan, Thomas Dean and Matthew Greig
Artificial Intelligence, volume 147, pp 163-223, 2003.

Least-Squares Policy Iteration [pdf]
Michail G. Lagoudakis and Ronald Parr
Journal of Machine Learning Research, volume 4, pp 1107-1149, 2003.

Tree-Based Batch Mode Reinforcement Learning [pdf]
Damien Ernst, Pierre Geurts and Louis Wehenkel
Journal of Machine Learning Research, volume 6, pp 503-556, 2005.

An Analytic Solution to Discrete Bayesian Reinforcement Learning [pdf]
Pascal Poupart, Nikos Vlassis, Jesse Hoey, Kevin Regan
ICML-06. 23rd International Conference on Machine Learning, Pittsburgh, PA, USA, June 2006.

Bandit based monte-carlo planning [pdf]
Levente Kocsis, Csaba Szepesvári
ECML-06. 17th European Conference on Machine Learning, Berlin, Germany, September 2006.

Knows What It Knows: A Framework For Self-Aware Learning [pdf]
Lihong Li, Michael L. Littman, Thomas J. Walsh
ICML-08. 25th International Conference on Machine Learning, Helsinki, Finland, July 2008.

An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning [pdf]
Ronald Parr, Lihong Li, Gavin Taylor, Christopher Painter-Wakefield, Michael L. Littman
ICML-08. 25th International Conference on Machine Learning, Helsinki, Finland, July 2008.

An analysis of model-based Interval Estimation for Markov Decision Processes [pdf]
Alexander L.Strehl and Michael L.Littman
Journal of Computer and System Sciences, volume 74, pp 1309-1331, 2008.

A Bayesian sampling approach to exploration in reinforcement learning [pdf]
John Asmuth, Lihong Li, Michael L. Littman, Ali Nouri, David Wingate
UAI-09. 25th Conference on Uncertainty in Artificial Intelligence, Montreal, Quebec, Canada, June 2009.

Fast gradient-descent methods for temporal-difference learning with linear function approximation [pdf]
Richard S. Sutton, Hamid Reza Maei, Doina Precup, Shalabh Bhatnagar, David Silver, Csaba Szepesvári, Eric Wiewiora
ICML-09. 26th International Conference on Machine Learning, Montreal, Quebec, Canada, June 2009.

Reinforcement Learning and Simulation-Based Search in Computer Go [pdf]
David Silver
Ph.D. Dissertation, University of Alberta, Edmonton, Alberta, Canada, 2009.

Transfer Learning for Reinforcement Learning Domains: A Survey [pdf]
Matthew E. Taylor and Peter Stone
Journal of Machine Learning Research, volume 10, pp 1633-1685, 2009.

Toward Off-Policy Learning Control with Function Approximation [pdf]
Hamid Reza Maei, Csaba Szepesvári, Shalabh Bhatnagar, Richard S. Sutton
ICML-10. 27th International Conference on Machine Learning, Haifa, Israel, June 2010.

Monte Carlo tree search in Kriegspiel [pdf]
Paolo Ciancarini and Gian Piero Favini
Artificial Intelligence, volume 174, pp 670-684, 2010.

Monte-Carlo tree search and rapid action value estimation in computer Go [pdf]
Sylvain Gelly and David Silver
Artificial Intelligence, volume 175, pp 1856-1875, 2011.

Greedy Algorithms for Sparse Reinforcement Learning [pdf]
Christopher Painter-Wakefield, Ronald Parr
ICML-12. 29th International Conference on Machine Learning, Edinburgh, Scotland, UK, July 2012.

A Survey of Monte Carlo Tree Search Methods [pdf]
Cameron Browne, Edward Powley, Daniel Whitehouse, Simon Lucas, Peter I. Cowling, Philipp Rohlfshagen, Stephen Tavener, Diego Perez, Spyridon Samothrakis and Simon Colton
IEEE Transactions on Computational Intelligence and AI in Games, volume 4, pp 1-43, 2012.

Batch-iFDD for representation expansion in large MDPs [pdf]
Alborz Geramifard, Thomas J. Walsh, Nicholas Roy, Jonathan P. How
UAI-13. 29th Conference on Uncertainty in Artificial Intelligence, Bellevue, Washington, USA, August 2013.

Offline policy evaluation across representations with applications to educational games [pdf]
Travis Mandel, Yun-En Liu, Sergey Levine, Emma Brunskill, Zoran Popovic
AAMAS-14. 2014 International Conference on Autonomous Agents and Multi-agent Systems, Paris, France, May 2014.

High-Confidence Off-Policy Evaluation [pdf]
Philip S. Thomas, Georgios Theocharous, Mohammad Ghavamzadeh
AAAI-15. 29th AAAI Conference on Artificial Intelligence, Austin, Texas, USA, January 2015.

Policy evaluation using the Ω-return [pdf]
Philip S. Thomas, Scott Niekum, Georgios Theocharous, George Konidaris
NIPS-15. 28th International Conference on Neural Information Processing Systems, Montreal, Canada, December 2015.

Mastering the game of Go without human knowledge [pdf]
David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, Yutian Chen, Timothy Lillicrap, Fan Hui, Laurent Sifre, George van den Driessche, Thore Graepel and Demis Hassabis
Nature, volume 550, pp 354–359, 2017.

NIPS 2017 Papers:

Optimistic posterior sampling for reinforcement learning: worst-case regret bounds [pdf]
Shipra Agrawal, Randy Jia
NIPS-17. 30th Annual Conference on Neural Information Processing Systems, Long Beach, California, USA, December 2017.

Regret Analysis for Continuous Dueling Bandit [pdf]
Wataru Kumagai
NIPS-17. 30th Annual Conference on Neural Information Processing Systems, Long Beach, California, USA, December 2017.

Minimal Exploration in Structured Stochastic Bandits [pdf]
Richard Combes, Stefan Magureanu, Alexandre Proutiere
NIPS-17. 30th Annual Conference on Neural Information Processing Systems, Long Beach, California, USA, December 2017.

Shallow Updates for Deep Reinforcement Learning [pdf]
Nir Levine, Tom Zahavy, Daniel J. Mankowitz, Aviv Tamar
NIPS-17. 30th Annual Conference on Neural Information Processing Systems, Long Beach, California, USA, December 2017.

Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning [pdf]
Shixiang Gu, Timothy Lillicrap, Zoubin Ghahramani, Richard E. Turner, Bernhard Schölkopf, Sergey Levine
NIPS-17. 30th Annual Conference on Neural Information Processing Systems, Long Beach, California, USA, December 2017.

Monte-Carlo Tree Search by Best Arm Identification [pdf]
Emilie Kaufmann, Wouter M. Koolen
NIPS-17. 30th Annual Conference on Neural Information Processing Systems, Long Beach, California, USA, December 2017.

Hybrid Reward Architecture for Reinforcement Learning [pdf]
Harm van Seijen, Mehdi Fatemi, Joshua Romoff, Romain Laroche, Tavian Barnes, Jeffrey Tsang
NIPS-17. 30th Annual Conference on Neural Information Processing Systems, Long Beach, California, USA, December 2017.

Robust and Efficient Transfer Learning with Hidden Parameter Markov Decision Processes [pdf]
Taylor Killian, Samuel Daulton, George Konidaris, Finale Doshi-Velez
NIPS-17. 30th Annual Conference on Neural Information Processing Systems, Long Beach, California, USA, December 2017.

Towards Generalization and Simplicity in Continuous Control [pdf]
Aravind Rajeswaran, Kendall Lowrey, Emanuel Todorov, Sham Kakade
NIPS-17. 30th Annual Conference on Neural Information Processing Systems, Long Beach, California, USA, December 2017.

Inverse Reward Design [pdf]
Dylan Hadfield-Menell, Smitha Milli, Pieter Abbeel, Stuart Russell, Anca Dragan
NIPS-17. 30th Annual Conference on Neural Information Processing Systems, Long Beach, California, USA, December 2017.

Learning Combinatorial Optimization Algorithms over Graphs [pdf]
Hanjun Dai, Elias B. Khalil, Yuyu Zhang, Bistra Dilkina, Le Song
NIPS-17. 30th Annual Conference on Neural Information Processing Systems, Long Beach, California, USA, December 2017.

Reinforcement Learning Symposium (NIPS 2017) Papers:

Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning [pdf]
Anusha Nagabandi, Gregory Kahn, Ronald S. Fearing, Sergey Levine
DRLS-17. Deep Reinforcement Learning Symposium, NIPS 2017, Long Beach, California, USA, December 2017.

Parameter Space Noise for Exploration [pdf]
Matthias Plappertyz, Rein Houthoofty, Prafulla Dhariwaly, Szymon Sidory, Richard Y. Cheny, Xi Chen, Tamim Asfourz, Pieter Abbeel, Marcin Andrychowiczy
DRLS-17. Deep Reinforcement Learning Symposium, NIPS 2017, Long Beach, California, USA, December 2017.

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor [pdf]
Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, Sergey Levine
DRLS-17. Deep Reinforcement Learning Symposium, NIPS 2017, Long Beach, California, USA, December 2017.

Time-Contrastive Networks: Self-Supervised Learning from Pixels [pdf]
Pierre Sermanet, Corey Lynch, Yevgen Chebotar, Jasmine Hsu, Eric Jang, Stefan Schaal, Sergey Levine
DRLS-17. Deep Reinforcement Learning Symposium, NIPS 2017, Long Beach, California, USA, December 2017.

Conferences, Symposia, Workshops:

DRLS-17. Deep Reinforcement Learning Symposium, NIPS 2017, Long Beach, USA, December 2017.

NIPS-17. Advances in Neural Information Processing Systems, NIPS 2017, Long Beach, USA, December 2017.

DRLW-16. Deep Reinforcement Learning Workshop, NIPS 2016, Barcelona, Spain, December 2016.

EWRL-16. The 13th European Workshop on Reinforcement Learning, Barcelona, Spain, December 2016.

DRLW-15. Deep Reinforcement Learning Workshop, NIPS 2015, Montreal, Canada, December 2015.

Schedule

Week Date Topic Readings and Files

Week 1 1/8

Homework 1
Slides 1

Week 2 1/15 Martin Luther King, Jr. Day - No class

Week 3 1/22

Homework 2
Slides 2

Week 4
1/29

Homework 3
Slides 3

Week 5 2/5 No class

Week 6 2/12

Slides 4
Pezeshki Slides | Paper
Broka Slides | Paper
Zou Slides | Paper

Week 7 2/19 Presidents' Day - No class Homework 4

Week 8 2/26

Homework 5
Xu Slides | Paper
Praveen Slides | Paper
Pandey Slides | Paper

Week 9
3/5

Homework 6
Blog Post
McAleer Slides| Paper
Dheeru Slides | Paper
Lanier & Takashi Slides | Paper

Week 10
3/12

Homework 7
Logan Slides| Paper
LaCroix Slides | Paper
Lee Slides | Paper

Week 11
3/19 4-6:30p
Lecture to be held in the usual classroom. Nelson Slides| Paper
Chen Slides | Paper
Moskvichev Slides | Paper

Week	Date	Topic	Readings and Files
Week 1	1/8		Homework 1 Slides 1
Week 2	1/15	Martin Luther King, Jr. Day - No class
Week 3	1/22		Homework 2 Slides 2
Week 4	1/29		Homework 3 Slides 3
Week 5	2/5	No class
Week 6	2/12		Slides 4 Pezeshki Slides \| Paper Broka Slides \| Paper Zou Slides \| Paper
Week 7	2/19	Presidents' Day - No class	Homework 4
Week 8	2/26		Homework 5 Xu Slides \| Paper Praveen Slides \| Paper Pandey Slides \| Paper
Week 9	3/5		Homework 6 Blog Post McAleer Slides\| Paper Dheeru Slides \| Paper Lanier & Takashi Slides \| Paper
Week 10	3/12		Homework 7 Logan Slides\| Paper LaCroix Slides \| Paper Lee Slides \| Paper
Week 11	3/19 4-6:30p	Lecture to be held in the usual classroom.	Nelson Slides\| Paper Chen Slides \| Paper Moskvichev Slides \| Paper