Questions to HRL Paper 7

Q1: How are the stochastic neural networks used in the paper different from standard feed-forward architectures for the policy?

Q2: What is the information-theoretic regularizer and what is its role?

Q3: How does the paper deal with termination conditions for each lower-level skill, i.e. how long are the lower-level skills executed once selected?