hrl-q7
Questions to HRL Paper 7
Q1: How are the stochastic neural networks used in the paper different from standard feed-forward architectures for the policy?
Q2: What is the information-theoretic regularizer and what is its role?
Q3: How does the paper deal with termination conditions for each lower-level skill, i.e. how long are the lower-level skills executed once selected?