Straightforward implementation of a difference penalty?

Consider a scenario where I have a categorical age variable, and the outcome changes smoothly with age category. For context, we can create the following simulated data:

N <- 1000
d <- tibble(agecat = sample.int(8, N, replace = TRUE)) %>%
  mutate(mu = sin(pi*agecat/7),
         Y = mu + rnorm(N, sd = .1))

I would like to fit a model that contains one parameter for each of the 8 age levels, but uses a first-order difference penalty to penalize the difference between the adjacent coefficients so that they change smoothly from one to the next. i.e., the penalty takes the form:

(b_1-b_2)^2 + (b_2 - b_3)^2 + \dots + (b_7-b_8)^2 = b'Db

where D can be computed in R as crossprod(diff(diag(8), differences = 1)).

This should be equivalent to a “random walk” prior across the coefficients. I was hoping to implement this in Stan by placing a multi_normal_prec prior on the (standardized) coefficients. i.e.,

target += multi_normal_prec_lpdf(b | rep_vector(0, 8), D);

where D is supplied as data. This should work because the log of the MVN prior would be a constant plus -b'Db/2.

However, I’m getting the following error messages:

Chain 1: Rejecting initial value:
Chain 1:   Error evaluating the log probability at the initial value.
Chain 1: Exception: multi_normal_prec_lpdf: LDLT_Factor of precision parameter is not positive definite.  last conditional variance is 0.  (in 'model31904df71b5a_smooth' at line 48)

Chain 1: 
Chain 1: Initialization between (-2, 2) failed after 100 attempts. 
Chain 1:  Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.
[1] "Error in sampler$call_sampler(args_list[[i]]) : Initialization failed."
[1] "error occurred during calling the sampler; sampling not done"

It’s saying the problem is that D is not positive definite. Is this simply a limitation of Stan currently, or is there something conceptual that I’m missing? Do you have any suggestions?

1 Like

[quote=“jgellar, post:1, topic:28516”]
It’s saying the problem is that D is not positive definite. Is this simply a limitation of Stan currently,
[/quote]’

Stan requires a full-rank covariance matrix (i.e., positive definite, not just positive semidefinite).

Having said that, there’s a much much easier way to do this in Stan:

parameters {
  vector[8] b;
}
model {
  b[2:8] - b[1:7] ~ normal(0, 1);
}

Stan also just lets you code the penalty directly:

for (n in 2:8)
  target += -0.5 * (b[n] - b[n-1])^2;

You could also make that normal(0, sigma) with sigma declared as a parameter and given a prior. That will just fit the scale of the random walk.

A slicker way to do this, especially if you want to reuse code, is to define the first order random walk distribution as a function.

functions {
  real rw1_lpdf(vector alpha, real sigma) {
    int N = cols(alpha);
    return normal_lpdf(alpha[2:N] - alpha[1:N - 1] | 0, sigma);
  }
}
parameters {
   vector[8] b;
}
model {
  b ~ rw1(1);  // sets sigma = 1, but it can be a parameter if you want
}

This isn’t a proper prior on b yet as it only constrains 7 of the 8 degrees of freedom. That’s OK in Stan as long as the posterior is proper. You’d need a prior on b[1] or on sum(b) or something like that to make the prior proper.

1 Like

Thanks @Bob_Carpenter , this is extremely helpful. I’ve only implemented the random walk in much clumsier ways in the past, these are cleaner.

My long-term goal is to be able to code this up using brms. It appears that any of the three methods you suggested could be done in brms, with a little bit of added code via stanvars. I’ll probably go with the fancy function definition, since I really like the idea of having a rw1(sigma) prior on the coefficients.

And yes, I was planning in including a (soft) sum-to-zero constraint so it’s identifiable.

1 Like

Thanks @Bob_Carpenter for showing the nice way to implement the rw1:

image

Could you please show how to code the second order random walk (rw2) , i.e.,

image

Thanks

It would be pretty cool if Stan added a rw(sigma, order) prior to the language.

1 Like

It wouldn’t be that hard to add. I think there’s already an issue requesting it.

I don’t know how to formulate a 3rd-order random walk, but the 2nd order one is just an AR(2) process with non-stationary coefficients 2 and -1.

real rw2_lpdf(vector alpha, real sigma) {
  real target = 0;
  for (n in 2:cols(alpha)) {
    target += normal_lpdf(alpha[n] | 2 * alpha[n-1] - alpha[n - 2], sigma);
  }
  return target;
}

This is still improper as there’s no prior on alpha[1] or alpha[2].

We can optionally add a first-order random walk term,

if (cols(alpha) >= 2)
  target += normal_lpdf(alpha[2] | alpha[1], 2 * sigma);

The multiplier is arbitrary, but I inserted it because we’re only using a first-order random walk to predict alpha[2] here. And it’s still improper without a prior on alpha[1].

Many thanks for sharing the code snippet.

One (quick) follow-up:

I am implementing the sum-to-zero constraint as

sum(b) ~ normal(0, 0.001 * 8)

However, when I do this I get LOTS of warning messages of the form:

Left-hand side of sampling statement (~) may contain a non-linear transform of a parameter or local variable.
If it does, you need to include a target += statement with the log absolute determinant of the Jacobian of the transform.

I know this is a warning and it doesn’t apply to my use case, since the sum is a linear transformation. But is there any way to get around printing the warning? The most annoying thing is that the warning gets repeated over and over again during sampling (it’s not clear to me how many times and why it’s repeated).

what if to play safely and write

real sum_b = sum(b);
sum_b ~ normal(0, 0.001 * 8);