# Straightforward implementation of a difference penalty?

Consider a scenario where I have a categorical age variable, and the outcome changes smoothly with age category. For context, we can create the following simulated data:

``````N <- 1000
d <- tibble(agecat = sample.int(8, N, replace = TRUE)) %>%
mutate(mu = sin(pi*agecat/7),
Y = mu + rnorm(N, sd = .1))
``````

I would like to fit a model that contains one parameter for each of the 8 age levels, but uses a first-order difference penalty to penalize the difference between the adjacent coefficients so that they change smoothly from one to the next. i.e., the penalty takes the form:

(b_1-b_2)^2 + (b_2 - b_3)^2 + \dots + (b_7-b_8)^2 = b'Db

where D can be computed in R as ` crossprod(diff(diag(8), differences = 1))`.

This should be equivalent to a â€śrandom walkâ€ť prior across the coefficients. I was hoping to implement this in Stan by placing a `multi_normal_prec` prior on the (standardized) coefficients. i.e.,

`target += multi_normal_prec_lpdf(b | rep_vector(0, 8), D);`

where `D` is supplied as data. This should work because the log of the MVN prior would be a constant plus `-b'Db/2`.

However, Iâ€™m getting the following error messages:

``````Chain 1: Rejecting initial value:
Chain 1:   Error evaluating the log probability at the initial value.
Chain 1: Exception: multi_normal_prec_lpdf: LDLT_Factor of precision parameter is not positive definite.  last conditional variance is 0.  (in 'model31904df71b5a_smooth' at line 48)

Chain 1:
Chain 1: Initialization between (-2, 2) failed after 100 attempts.
Chain 1:  Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.
[1] "Error in sampler\$call_sampler(args_list[[i]]) : Initialization failed."
[1] "error occurred during calling the sampler; sampling not done"
``````

Itâ€™s saying the problem is that `D` is not positive definite. Is this simply a limitation of Stan currently, or is there something conceptual that Iâ€™m missing? Do you have any suggestions?

[quote=â€śjgellar, post:1, topic:28516â€ť]
Itâ€™s saying the problem is that `D` is not positive definite. Is this simply a limitation of Stan currently,
[/quote]â€™

Stan requires a full-rank covariance matrix (i.e., positive definite, not just positive semidefinite).

Having said that, thereâ€™s a much much easier way to do this in Stan:

``````parameters {
vector[8] b;
}
model {
b[2:8] - b[1:7] ~ normal(0, 1);
}
``````

Stan also just lets you code the penalty directly:

``````for (n in 2:8)
target += -0.5 * (b[n] - b[n-1])^2;
``````

You could also make that `normal(0, sigma)` with `sigma` declared as a parameter and given a prior. That will just fit the scale of the random walk.

A slicker way to do this, especially if you want to reuse code, is to define the first order random walk distribution as a function.

``````functions {
real rw1_lpdf(vector alpha, real sigma) {
int N = cols(alpha);
return normal_lpdf(alpha[2:N] - alpha[1:N - 1] | 0, sigma);
}
}
parameters {
vector[8] b;
}
model {
b ~ rw1(1);  // sets sigma = 1, but it can be a parameter if you want
}
``````

This isnâ€™t a proper prior on `b` yet as it only constrains 7 of the 8 degrees of freedom. Thatâ€™s OK in Stan as long as the posterior is proper. Youâ€™d need a prior on `b[1]` or on `sum(b)` or something like that to make the prior proper.

Thanks @Bob_Carpenter , this is extremely helpful. Iâ€™ve only implemented the random walk in much clumsier ways in the past, these are cleaner.

My long-term goal is to be able to code this up using `brms`. It appears that any of the three methods you suggested could be done in `brms`, with a little bit of added code via `stanvars`. Iâ€™ll probably go with the fancy function definition, since I really like the idea of having a `rw1(sigma)` prior on the coefficients.

And yes, I was planning in including a (soft) sum-to-zero constraint so itâ€™s identifiable.

Thanks @Bob_Carpenter for showing the nice way to implement the rw1:

Could you please show how to code the second order random walk (rw2) , i.e.,

Thanks

It would be pretty cool if Stan added a `rw(sigma, order)` prior to the language.

1 Like

It wouldnâ€™t be that hard to add. I think thereâ€™s already an issue requesting it.

I donâ€™t know how to formulate a 3rd-order random walk, but the 2nd order one is just an AR(2) process with non-stationary coefficients 2 and -1.

``````real rw2_lpdf(vector alpha, real sigma) {
real target = 0;
for (n in 2:cols(alpha)) {
target += normal_lpdf(alpha[n] | 2 * alpha[n-1] - alpha[n - 2], sigma);
}
return target;
}
``````

This is still improper as thereâ€™s no prior on `alpha[1]` or `alpha[2]`.

We can optionally add a first-order random walk term,

``````if (cols(alpha) >= 2)
target += normal_lpdf(alpha[2] | alpha[1], 2 * sigma);
``````

The multiplier is arbitrary, but I inserted it because weâ€™re only using a first-order random walk to predict `alpha[2]` here. And itâ€™s still improper without a prior on `alpha[1]`.

Many thanks for sharing the code snippet.

One (quick) follow-up:

I am implementing the sum-to-zero constraint as

``````sum(b) ~ normal(0, 0.001 * 8)
``````

However, when I do this I get LOTS of warning messages of the form:

``````Left-hand side of sampling statement (~) may contain a non-linear transform of a parameter or local variable.
If it does, you need to include a target += statement with the log absolute determinant of the Jacobian of the transform.
``````

I know this is a warning and it doesnâ€™t apply to my use case, since the sum is a linear transformation. But is there any way to get around printing the warning? The most annoying thing is that the warning gets repeated over and over again during sampling (itâ€™s not clear to me how many times and why itâ€™s repeated).