Consider a scenario where I have a categorical age variable, and the outcome changes smoothly with age category. For context, we can create the following simulated data:
N <- 1000
d <- tibble(agecat = sample.int(8, N, replace = TRUE)) %>%
mutate(mu = sin(pi*agecat/7),
Y = mu + rnorm(N, sd = .1))
I would like to fit a model that contains one parameter for each of the 8 age levels, but uses a first-order difference penalty to penalize the difference between the adjacent coefficients so that they change smoothly from one to the next. i.e., the penalty takes the form:
(b_1-b_2)^2 + (b_2 - b_3)^2 + \dots + (b_7-b_8)^2 = b'Db
where D can be computed in R as crossprod(diff(diag(8), differences = 1))
.
This should be equivalent to a “random walk” prior across the coefficients. I was hoping to implement this in Stan by placing a multi_normal_prec
prior on the (standardized) coefficients. i.e.,
target += multi_normal_prec_lpdf(b | rep_vector(0, 8), D);
where D
is supplied as data. This should work because the log of the MVN prior would be a constant plus -b'Db/2
.
However, I’m getting the following error messages:
Chain 1: Rejecting initial value:
Chain 1: Error evaluating the log probability at the initial value.
Chain 1: Exception: multi_normal_prec_lpdf: LDLT_Factor of precision parameter is not positive definite. last conditional variance is 0. (in 'model31904df71b5a_smooth' at line 48)
Chain 1:
Chain 1: Initialization between (-2, 2) failed after 100 attempts.
Chain 1: Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.
[1] "Error in sampler$call_sampler(args_list[[i]]) : Initialization failed."
[1] "error occurred during calling the sampler; sampling not done"
It’s saying the problem is that D
is not positive definite. Is this simply a limitation of Stan currently, or is there something conceptual that I’m missing? Do you have any suggestions?