Questions to HRL Paper 6

Q1: In the present paper, which parts of an option are learned with policy gradient updates? How are these parts represented?

Q2: How would you explain the faster progress of the blue curve in Fig. 2c compared to the green and red ones?

Q3: How does the approach of the option-critic architecture differ from previous approaches using pseudo-rewards or subgoals?