Imitating Optimal Control
This lecture covered further extensions to model based rl in Berkeley’s deep rl course. It provides methodologies for training a policy using model based rl methods. In particular, it introduces the GPS (Guided Policy Search) algorithm, which is somewhat resembles imitation learning on a Model Based method, and the PLATO(???) algorithm, which uses KL divergence to make DAgger more feasible.
As of this moment, all basic methods given in the course (except for Gradient-Free methods) have been introduced. And, as of this moment, I would say that the course’s illustrative examples help establish a good baseline of RL knowledge. In the remaining lectures, the course will cover more advanced methods. I look forward to see them.
My writeup can be found below: