Reading list
(Note 2020-01-18: This reading list is outdated.)
Reading
- Booth et al.: The Craft of Research
Read
- 2019-08-08 Security amplification
- 2019-08-07 Meta-execution
- 2019-08-06 Reliability amplification
- 2019-08-02 Techniques for optimizing worst-case performance
- 2019-08-01 Thoughs on reward engineering
- 2019-07-30 Learning with catastrophes
- 2019-07-30 Capability amplification
- 2019-07-25 The reward engineering problem
- 2019-07-24 Directions and desiderata for AI alignment
- 2019-07-23 AlphaGo Zero and capability amplification
- 2019-07-22 Supervising strong learners by amplifying weak experts
- 2019-07-16 Benign model-free RL
- 2019-07-15 Unpublished draft of an article on quantum computing and AI alignment.
- 2019-07-11 Iterated Distillation and Amplification
- 2019-07-10 Corrigibility
- 2019-07-09 Humans Consulting HCH
- 2019-07-09 Approval-directed bootstrapping
- 2019-07-05 Understanding Iterated Distillation and Amplification: Claims and Oversight
- 2019-06-27 Approval-directed agents
- 2019-06-18 Embedded World-Models and the corresponding section of Embedded Agency. Time-boxed, so I understood only part of it.
- 2019-06-14 Decision Theory and the corresponding section of Embedded Agency. I didn't understand everything, but aborted, because it was taking too much time.
- 2019-06-12 Future directions for ambitious value learning
- 2019-06-12 Model Mis-specification and Inverse Reinforcement Learning
- 2019-06-11 Latent Variables and Model Mis-Specification
- 2019-06-10 Humans can be assigned any values whatsoever…
- 2019-06-07 The easy goal inference problem is still hard
- 2019-06-05 What is ambitious value learning?
- 2019-06-05 Preface to the sequence on value learning
- 2019-06-05 Embedded Agents and the corresponding section of Embedded Agency.
- 2019-06-03 Prosaic AI alignment
- 2019-05-31 An unaligned benchmark
- 2019-05-29 Clarifying "AI Alignment"
- 2019-05-28 The Steering Problem
- 2019-05-28 Preface to the sequence on iterated amplification