Initialization Issues with Cholesky Factorization of Covariance Matrix

Hey y’all! I am trying to run a hierarchical Bernoulli logit model with a multivariate normal prior on the coefficients with priors on the mean and covariance of the multivariate normal. I am running into initialization issues (“Initialization failed” in pystan) when using the Cholesky factorization of the covariance matrix (yet I was not running into these issues when using the inverse wishart prior, but that has been extremely slow so far). Additionally, I am not entirely sure if the model is specified correctly as I am fairly new to stan. Any help would be greatly, greatly appreciated as I am working on my masters thesis and these computational issues are really getting in the way and stressing me out!! Cheers

data { 
    int<lower=1> D; //Number of individual predictors
    int<lower=0> N; //Number of data points
    int<lower=1> P; //Number of levels 
    int<lower=0, upper =1> y[N]; //Classification
    int<lower=0, upper = P> ll[N]; //levels of each data point
    row_vector[D] x[N]; //Matrix of individual predictors
    matrix[D,D] identity; //Identity matrix used as a prior
    vector[D] zeros; // Vector of zeros used as a prior
}
parameters { 
    vector[D] mu; //Mean vector of multivariate normal
    vector[D] beta[P]; //Coeficients for each of the various levels
    cholesky_factor_corr[D] L_Omega; //Cholesky correlation 
    vector<lower=0, upper=pi()/2>[D] tau_unif; //cholesky scale
    
}
transformed parameters { 
    vector<lower=0>[D] tau; //cholesky scale
    matrix[D,D] Sigma_beta; //Covariance matrix defined from cholesky
    for (d in 1:D) { 
        tau[d] = 2.5 * tan(tau_unif[d]);
    }
    Sigma_beta = diag_pre_multiply(tau, L_Omega); //Calculation of covariance from cholesky

    
}

model { 
    mu ~ multi_normal(zeros, identity); //Prior on multivariate normal
    L_Omega ~ lkj_corr_cholesky(3); 
    tau_unif ~ normal(0,1);
    
    for (p in 1:P){
        beta[p] ~ multi_normal(mu, Sigma_beta);
    }
    
    {
        vector[N] x_beta_ll;
        for (n in 1:N) { 
            x_beta_ll[n] = x[n] * beta[ll[n]];
        }
        y ~ bernoulli_logit(x_beta_ll);
    }
}

Thank you so much in advance!

I think there are two main issues here. The first is that you don’t need to construct the standard deviations on the cholesky scale, you can just estimate them directly and use them to construct the covariance matrix.

The second is that you’re passing a cholesky factor to a distribution that expects a full covariance matrix (since diag_pre_multiply(tau, L_Omega) returns a cholesky factor). If you change the prior for beta[p] to multi_normal_cholesky, that should help.

There are also some other optimisations that you can look at here, which will speed things up a bit for you.

First, the multivariate prior for mu is actually fairly inefficient, and would perform better with a univariate prior, thanks to Stan’s vectorisation, this can be equivalently specified as:

mu ~ std_normal();

Similarly, the loop isn’t needed in the prior for beta and can be specified as:

beta ~ multi_normal_cholesky(mu, Sigma_beta);

If you find issues (performance or quality) in the sampling for beta, you could also try a non-centered reparameterisation (see this section of the manual for more info)

Hope some of that helps!

Cheers,
Andrew