I have a HDLM that looks as follows:
y_t \sim \mathcal{N}(X_t \cdot \beta_t, V),
\beta_t \sim \mathcal{N}(\beta_{t-1}, W),
\beta_0 \sim \mathcal{N}(m_0, W),
with t \in \{1,\ldots,N\}, y_t being a vector of length m, X_t being a matrix of size m \times mp and \beta_t – a column vector of length mp.
I have implemented it successfully using JAGS, setting the prior on W to be inverse Wishart with the expected value being some random positive-definite matrix. I’m getting a very satisfying fit with that.
Now, looking for a speed-up, I’ve tried to implement the model using STAN. Here, I’m providing the covariance matrix V on input. Following STAN documentation, to describe W I’m doing
parameters {
corr_matrix[m*p] Omega; // prior correlation
vector<lower=0>[m*p] tau; // prior scale
...
}
model {
matrix[m*p, m*p] W;
tau ~ cauchy(0, 2.5);
Omega ~ lkj_corr(2);
W = quad_form_diag(Omega, tau);
...
The whole model is
data {
int<lower=1> N; // maximal time
int<lower=1> m; // the number of cells
int<lower=1> p; // number of covariates, including intercept
cov_matrix[m] V; // covariance matrix of sales
matrix[m, N] Y; // the matrix of sales: Y[k,t] = sales in cell k at time t; Y[:,t] = sales at time t
real X[N,m,m*p]; // this is an array; X[t,:,:] = covariates at time t
vector[m*p] m0; // expected coefficients at time 0
}
parameters {
matrix[m*p, N+1] beta; // beta[:,t] = coefficients at time t
// for prior covariance matrix
corr_matrix[m*p] Omega; // prior correlation
vector<lower=0>[m*p] tau;// prior scale
}
transformed parameters {
matrix[m, N] ExpY;
matrix[N, m] ExpYtransposed; // can assign values only to rows, and not columns of matrices
for(t in 1:N){
ExpYtransposed[t] = to_row_vector(to_matrix(X[t,:,:]) * beta[:,t]);
}
ExpY = ExpYtransposed';
}
model {
matrix[m*p, m*p] W;
tau ~ cauchy(0, 2.5);
Omega ~ lkj_corr(2);
W = quad_form_diag(Omega, tau);
//the coefficients
col(beta,1) ~ multi_normal(m0, W);
for(t in 2:N){
col(beta,t) ~ multi_normal(col(beta,t-1), W);
}
//the sales
for(t in 1:N){
col(Y,t) ~ multi_normal(col(ExpY,t), V);
}
}
When trying to fit it, I end up with some warnings (“There were 1911 divergent transitions after warmup”, “The largest R-hat is NA, indicating chains have not mixed.”, etc.). When plotting traceplots of ‘tau’, I can see that not only have the chains not mixed, but that each of the three chains is constant.
Then I set W to be the identity matrix and provided it on input. I got a small number of divergent transitions, and a very poor fit, but the chains somewhat mixed. So I’m guessing (or hoping) that the above problems are due to the specification of W, and not the model being bad.
I’m new to STAN. I’d be grateful for tips on improving the model in any way, but especially with regards to the covariance matrix W.