I have a small query related to the scaling of likelihood value while using inside the HMC. I was hoping if somebody can shed some light on the issue described below.
Below is the equation used to accept or reject the proposal.
H_initial = -Initial_U + Initial_KE;
H_proposed = -Proposed_U + Proposed_KE;
h_diff = H_proposed - H_initial;
alpha = minc(1|exp(-h_diff));
Let’s, say I am estimating a discrete choice model (MNL) with 1000 individuals. Now, if I take the sum of log-likelihood value for all 1000 individuals, the combined sum will be way to bigger in absolute magnitude than the kinetic energy. Next, if the proposed Hamiltonian is smaller in magnitude than the initial Hamiltonian (say even by 10 points), alpha will be 1.
Next, we use the dual-averaging algorithm to tune the step-size (epsilon) using the equations mentioned below.
eta = 1 / (iteration_number + t0);
Hbar = (1 - eta) * Hbar + eta * (delta - alpha);
epsilon = exp(mu - ((sqrt(iteration_number)/gamma_value)*Hbar));
where t0, mu, gamma_value are fixed parameter values for the algorithm as suggested in your paper.
Now what happens is that because alpha is 1, step-size (epsilon) explodes (goes from 1 to 50 in 10 to 20 iterations) and consequently, the model parameter values (beta’s) explode as well.
I was wondering if anyone has encountered such issues and is there some way to scale the log-likelihood (LL) value to avoid these issues.
I tried working with the average of the LL value which works fine if you have few parameters (say around 5). However, if the number of parameters increases (say around 20 or more), the average of LL value approach won’t work as total kinetic energy is way bigger in magnitude than the average LL value.