Multinomial sampling when joining a new tree in dynamic NUTS implementation

I was looking at the base_hmc.hpp implementation, because I’ve been trying to implement it in R, and I found something peculiar.

Say we have an old tree with weight w_{\mathrm{old}}, and we just built a new tree with weight w_{\mathrm{new}}. As I understand, the rule is that if w_{\mathrm{new}} > w_{\mathrm{old}} then we sample from the new tree.

This is what is checked at line 136 of the linked file: log_sum_weight_subtree is the log of w_{\mathrm{new}}, and log_sum_weight is the log of w_{\mathrm{old}}. Once this if-else statement finishes executing, then log_sum_weight is updated to be the log of the sum of both weights.

Now, I noticed that this contrasts with the analogous multinomial sampling that occurs in the build_tree(.) function at line 278. Here, log_sum_weight_right indeed represents the log of w_{\mathrm{new}}, but it is being compared to log_sum_weight_subtree which at this point in the code is set to \log(w_{\mathrm{new}} + w_{\mathrm{old}}). Should it not instead be compared to log_sum_weight_left which represents the log of w_{\mathrm{old}}?

I’m probably missing something, but I just can’t figure out what. Can anyone chime in?

1 Like

States are sampled according to multinomial weights within each subtree, but the subtrees themselves are not sampled uniformly but rather biased towards subtrees away from the initial point. The former uses w_{new} / (w_{new} + w_{old}) whereas the latter uses w_{new} / w_{old}.

See Section A.3.2 of https://arxiv.org/abs/1701.02434 for a more in depth discussion and proof of validity.

2 Likes

In the last paragraph of that paper it briefly mentions the modifications to NUTS that Stan implements. Is there a paper or wiki with a list of all the modifications implemented in Stan?

2 Likes

Thanks Mike!

Just to recapitulate to make sure I understand: the biased progressive sampling is only meant to be used after the original call to build_tree() that occurs in the trasition() function.

Would it be possible to also do biased progressive sampling within in the building of the subtree as well (as oppose to uniform sampling)? Would that break detailed balance?

From what I understand, and @betanalpha can correct me on this one, the two major changes are:

  1. Replacing the slice sampling mechanism with the multinomial sampling scheme.
  2. Replacing the U-Turn criteria with the generalized U-Turn criteria described on page 58 of @betanalpha’s intro paper.
2 Likes

I’m not sure off the top of my head.