Sie sind hier: Startseite Teaching SS2018 hrl-q6


Questions to HRL Paper 6

Q1: In the present paper, which parts of an option are learned with policy gradient updates? How are these parts represented?

Q2: How would you explain the faster progress of the blue curve in Fig. 2c compared to the green and red ones?

Q3: How does the approach of the option-critic architecture differ from previous approaches using pseudo-rewards or subgoals?