Hi I would like to implement a random-walk prior for a parameter vector \gamma that is constraint to be non-negative. I came up with the following code

```
parameters {
row_vector[m] gammas_raw;
real<lower=0> tau;
}
transformed parameters {
row_vector<lower=0>[m] gammas;
{
row_vector[m] loc_gammas;
loc_gammas[1] = gammas_raw[1];
for(i in 2:m)
loc_gammas[i] = loc_gammas[i-1] + gammas_raw[i] * tau;
gammas = fabs(loc_gammas);
}
}
model {
gammas_raw ~ normal(0, 1);
tau ~ cauchy(0,1);
}
```

This essentially constructs a random-walk but then takes the absolute value.

Alternatively you could construct a process that behaves like a random walk but in case it would cross â€ś0â€ť towards the non-negatives it would behave differently. In mathematical terms: Suppose \gamma_i>0 is the current value and the increment is \delta_i<0 s.t. \gamma_{i+1} = \gamma_i + \delta_i < 0. In this case set \gamma_{i+1} = -\delta_i - \gamma_i >0. Thus in general \gamma_{i+1} = \vert \delta_i + \gamma_i\vert.

Are there any arguments against or in favour of the two approaches? Has this been studied elsewhere and are there other choices of random walk priors constrained (conditioned) to be non-negative?

The absolute value function is likely to mess up the sampling. I would do a random-walk on the log scale.

2 Likes

Actually it turns out that in my case the log-random-walk proposal leads to much worse NUTS characteristics (I still get hundreds of max-tree-depth-exceeded warnings with max_tree_depth=.99) than the `fabs`

-based one. Why should the latter be actually so bad?

Right now gammas isnâ€™t involved in your log density calculations, so it doesnâ€™t matter that there are absolute value calculations in there.

Once gammas gets involved in a sampling statement though (or a target increment), Stan will need partial derivatives of the log density, which will require autodiffing through the fabs, which doesnâ€™t have a derivative at zero. Things will probly get bad then. Stan requires the log density be first differentiable on the unconstrained space (cause NUTS/HMC require it).

2 Likes

Are there a lot of values near zero? Are you also seeing divergences?

Noise is multiplicative on the log scale and additive on the linear scale, which can have a big impact on fits.

We fudge this in the implementation by taking the derivative to be -1 for negative numbers, +1 for positive numbers, and 0 otherwise. Whether this is unstable will depend on the step sizes and whether thereâ€™s a lot of support around zero.

The main problem with the absolute value is that it leads to multimodality.

To keep things positive, the innovation each iteration must be constrained to be greater than the negation of the previous iterationâ€™s value. You can constrain a bunch of values separatelyâ€”thereâ€™s a description in the manual.