hrl-q6
Questions to HRL Paper 6
Q1: In the present paper, which parts of an option are learned with policy gradient updates? How are these parts represented?
Q2: How would you explain the faster progress of the blue curve in Fig. 2c compared to the green and red ones?
Q3: How does the approach of the option-critic architecture differ from previous approaches using pseudo-rewards or subgoals?