Today I watched lecture 6 from Berkeley’s deep rl course, which covered various value functions.
Most of the material wasn’t very inspiring, as the actual derivation of the algorithms was rather straightforward and intuitive. However, I enjoyed the section on why the fitted algorithms do not theoretically converge. I found that the usage of norms, contractions, and fixed points inspiring, especially because it ties into my goals of researching the math behind machine learning.
My full notes are below: