Reward Reports for Reinforcement Learning

Towards Documentation and Understanding of Dynamic Machine Learning Systems

HIRING! We are hiring an intern to help build out Reward Reports! If interested, please contact tg299 at cornell dot edu.

Join us at our workshop on Building Accountable and Transparent RL, at the The Multi-disciplinary Conference on Reinforcement Learning and Decision Making (RLDM) June 11th, 2022

Building on the documentation frameworks for “model cards” and “datasheets” proposed by Mitchell et al. and Gebru et al., we argue the need for Reward Reports for AI systems. In a whitepaper recently published by the Center for Long-Term Cybersecurity, we introduced Reward Reports as living documents for proposed RL deployments that demarcate design choices. However, many questions remain about the applicability of this framework to different RL applications, roadblocks to system interpretability, and the resonances between deployed supervised machine learning systems and the sequential decision-making utilized in RL. At a minimum, Reward Reports are an opportunity for RL practitioners to deliberate on these questions and begin the work of deciding how to resolve them in practice.

To learn more about Reward Reports, see the Reward Reports paper, the CLTC RL Risks Whitepaper , or the github repo with a template.


Thomas Krendl Gilbert is a postdoctoral fellow at the Digital Life Initiativeat Cornell Tech in New York City. He previously designed and received a Ph.D. in Machine Ethics and Epistemology at the University of California at Berkeley, Berkeley, CA, USA. Thomas researches the emerging political economy of AI, in particular reinforcement learning systems.
Tom Zick earned her PhD from UC Berkeley and is currently pursuing her JD at Harvard. Her research bridges between AI ethics and law, with a focus on how to craft safe and equitable policy surrounding the adoption of AI in high-stakes domains. In the past, she has worked as a data scientist at the Berkeley Center for Law and Technology, evaluating the capacity of regulations to promote open government data. She has also collaborated with graduate students across social science and engineering to advocate for pedagogy reform focused on infusing social context into technical coursework. Outside of academia, Tom has crafted digital policy for the City of Boston as a fellow for the Mayor’s Office for New Urban Mechanics and helped early stage startups develop responsible AI frameworks. Her current research centers on the near term policy concerns surrounding reinforcement learning.
Nathan Lambert is a Research Scientist at HuggingFace. He received his PhD from the University of California, Berkeley working at the intersection of machine learning and robotics. He is a member of the Department of Electrical Engineering and Computer Sciences, advised by Professor Kristofer Pister in the Berkeley Autonomous Microsystems Lab. Nathan has worked extensively with Roberto Calandra at Facebook AI Research and is joining DeepMind Robotics remotely for the summer of 2021. During his Ph.D., he was awarded the UC Berkeley EECS Demetri Angelakos Memorial Achievement Award for Altruism.
Sarah Dean is an Assistant Professor in the Computer Science Department at Cornell. She recently completed a PhD in EECS from UC Berkeley and was a postdoc at the University of Washington. Sarah is interested in the interplay between optimization, machine learning, and dynamics, and her research focuses on understanding the fundamentals of data-driven control and decision-making.
Aaron is a computer science research fellow in computational law at the Australian Research Council Centre of Excellence for Autonomous Decision Making and Society. With a background in cross-disciplinary mechatronic engineering, Aaron’s Ph.D. developed new theory and algorithms for Inverse Reinforcement Learning in the maximum conditional entropy and multiple intent settings. Aaron’s ongoing work investigates sociotechnical and legal-theoretic methods for AI accountability.