Policy Gradients


Today I watched lecture 4 from Berkeley’s deep rl course, which covered policy gradients. Policy gradients, which heavily rely on sampled data, actually have some really nice math behind them! In particular, there was one interesting trick that helped separate out the derivative for the policy, which made it actually workable! I include the notes below:

Policy Gradient Derivations