Random walk prior on non-negative reals


#1

Hi I would like to implement a random-walk prior for a parameter vector \gamma that is constraint to be non-negative. I came up with the following code

parameters {
  row_vector[m] gammas_raw;
  real<lower=0> tau;
}
transformed parameters {
  row_vector<lower=0>[m] gammas;
  { 
   row_vector[m] loc_gammas;
   loc_gammas[1] = gammas_raw[1];
   for(i in 2:m)
     loc_gammas[i] = loc_gammas[i-1] + gammas_raw[i] * tau;
   gammas = fabs(loc_gammas);
  }
}
model {
  gammas_raw ~ normal(0, 1);
  tau ~ cauchy(0,1);
}

This essentially constructs a random-walk but then takes the absolute value.

Alternatively you could construct a process that behaves like a random walk but in case it would cross “0” towards the non-negatives it would behave differently. In mathematical terms: Suppose \gamma_i>0 is the current value and the increment is \delta_i<0 s.t. \gamma_{i+1} = \gamma_i + \delta_i < 0. In this case set \gamma_{i+1} = -\delta_i - \gamma_i >0. Thus in general \gamma_{i+1} = \vert \delta_i + \gamma_i\vert.

Are there any arguments against or in favour of the two approaches? Has this been studied elsewhere and are there other choices of random walk priors constrained (conditioned) to be non-negative?


#2

The absolute value function is likely to mess up the sampling. I would do a random-walk on the log scale.


#3

Actually it turns out that in my case the log-random-walk proposal leads to much worse NUTS characteristics (I still get hundreds of max-tree-depth-exceeded warnings with max_tree_depth=.99) than the fabs-based one. Why should the latter be actually so bad?


#4

Right now gammas isn’t involved in your log density calculations, so it doesn’t matter that there are absolute value calculations in there.

Once gammas gets involved in a sampling statement though (or a target increment), Stan will need partial derivatives of the log density, which will require autodiffing through the fabs, which doesn’t have a derivative at zero. Things will probly get bad then. Stan requires the log density be first differentiable on the unconstrained space (cause NUTS/HMC require it).


#5

Are there a lot of values near zero? Are you also seeing divergences?

Noise is multiplicative on the log scale and additive on the linear scale, which can have a big impact on fits.

We fudge this in the implementation by taking the derivative to be -1 for negative numbers, +1 for positive numbers, and 0 otherwise. Whether this is unstable will depend on the step sizes and whether there’s a lot of support around zero.

The main problem with the absolute value is that it leads to multimodality.

To keep things positive, the innovation each iteration must be constrained to be greater than the negation of the previous iteration’s value. You can constrain a bunch of values separately—there’s a description in the manual.