Difficulty specifying non-centered varying slopes model

I am trying to learn working directly in Stan by iteratively building out a hierarchical model.

I was able to get what look like reasonable results through non-centered group-level specification as discussed in this thread.

I am now trying to add in an additional covariate with group-level varying slopes for that covariate.

The following model runs, but with a lot of divergences and some NA R-hats:

data {
        int<lower=1> N;  // total number of observations
        vector[N] y ;  // response variable
        int<lower=1> N_Grp;  // number of groups
        int<lower=1> Grp[N];  // Group indicator
        int<lower=1> N_Cols; // 
        matrix[N, N_Cols] X1;
        matrix[N, N_Cols] X2; 
        vector[N] X3;
        vector[N] X4;  // 
parameters {
        real beta1;  
        vector[N_Grp] beta2; // 
        vector[N_Grp] mu_grp;  // standardized group means
        real<lower=0> kappa;  // precision parameter
        real a;
        real b;
        vector<lower=0>[2] sd_grp;  // group-level standard deviation
        corr_matrix[2] Rho; //correlation matrix
transformed parameters {
        vector[N_Grp] grp_int;  
        grp_int = (sd_grp[2] * (mu_grp)); // Group intercepts
model {
        vector[N] mu;
        // varying slopes and varying intercepts component
        // Vector for intercept mean and slope mean
        vector[2] YY[N_Grp];
        vector[2] MU;
        MU = [a, b]';
        for (j in 1:N_Grp) YY[j] = [mu_grp[j], beta2[j]]';
          YY ~ multi_normal(MU, quad_form_diag(Rho, sd_grp));
        // Linear predictor
        for (n in 1:N) {
          mu[n] = grp_int[Grp[n]] + ((sum(row(X1, n) .* row(X2, n))  / X3[n]) * beta1 + beta2[Grp[n]] * X4[n]);
        // likelihood 
        y ~ beta(inv_logit(mu) * kappa, (1 - inv_logit(mu)) * kappa);
        // priors 
        beta1 ~ normal(10,10);
        beta2 ~ normal(0,5);
        kappa ~ gamma(0.01, 0.01);
        sd_grp ~ student_t(3, 0, 2.5);
        mu_grp ~ std_normal();
        Rho ~ lkj_corr(1);
generated quantities {
        vector[N] log_lik;
        vector[N] y_pred;
        vector[N] mu_pred;
                for (n in 1:N) {
                mu_pred[n] = grp_int[Grp[n]] + ((sum(row(X1, n) .* row(X2, n))  / X3[n]) * beta1 + beta2[Grp[n]] * X4[n]);
                for (n in 1:N) {
                y_pred[n] = beta_rng(inv_logit(mu_pred[n]) * kappa, (1 - inv_logit(mu_pred[n])) * kappa);
                log_lik[n] = beta_lpdf(y_pred[n] | inv_logit(mu_pred[n]) * kappa, (1 - inv_logit(mu_pred[n])) * kappa);

Is there anything obviously incorrect in here that I’m missing?

1 Like

the model appears to be a bit weird version where the variables used in the model block don’t match with parameters and I cannot compile it. Did you manage to resolve the problem in the meantime? Could you post an updated version? Some of those problems could also be data-dependent, so if you shared your data, it could also help a bit.

The only (minor) thing I can say with some confidence regards the multivariate normal distribution where you have

it is often more efficient to work with the cholesky_factor_corr type, where you would have an equivalent model with:

cholesky_factor_corr[2] L_rho;


YY ~ multi_normal_cholesky(MU, diag_pre_multiply(sd_grp, L_Rho))

See 1.15 Multivariate outcomes | Stan User’s Guide for a bit more details

Best of luck with your model!

Thank you @martinmodrak!

I corrected the model block in the original post, my apologies for the typos!

I’ll try the cholesky_factor_corr and see how that helps.

Unfortunately, the model still does not compile (there is a missing semicolon and something is off in the generated quantities block). You seem to put two priors on beta2 and mu_grp - one directly and another via the YY ~ multi_normal(MU, quad_form_diag(Rho, sd_grp)). That’s usually not what you want. When using non-centered parametrization (which you probably should be using for both beta2 and mu_grp) the idea is that you put std_normal() priors on your actual parameters and then compute the centered versions (but you don’t need to put priors on those computed versions). 1.13 Multivariate priors for hierarchical models | Stan User’s Guide shows how that might look like. Would that make sense?

Thanks again @martinmodrak!

I found the errors and the model compiles now, but you’ve pointed out what I think is a significant issue.

I was attempting to adapt the varying slopes model from this blog post.

The approach in the blog post is quite different from that in the User Guide - I’ll try to adapt the User Guide approach to my model.

1 Like

Unfortunately, it appears the blog post is not following best practices. From a quick skim, it appears their implementation of the varying slopes model is correct, but likely inefficient and somewhat hard to read. I would surely recommend starting from the approach in the User’s guide.