hrl-q4

Questions to HRL Paper 4

Q1: With respect to what objective is the hierarchical policy trained and which optimization scheme is used to learn the policy (briefly point out the individual steps and compare it to the EM algorithm)?

Q2: What is the purpose of the hybrid categorical-continuous high-level policy (compared to only a categorical policy) and how do the authors demonstrate its superiority?

Q3: How do the authors choose the number of options k and why would it be misleading to choose k according to the log-likelihood objective? Optional: Can you think of a similar problem in a different machine learning setting and, based on that, can you think of a different approach to choose the number of options?