Exceptions returned: "wishart_lpdf: random variable is not symmetric" and "gamma_lpdf: Random variable is inf, but must be positive finite!"

I am currently trying to translate and fit a SEM using cmdstanr (see below), based on a model originally written for WinBUGS (code at the bottom of page). The current model will run to completion, however, I get the following exceptions returned in the output, repeatedly, when fitting the model (lines causing exceptions are marked with comments in the code sample below):

Chain 1 Exception: wishart_lpdf: random variable is not symmetric. random variable[1,2] = -inf, but random variable[2,1] = -inf (in '/tmp/RtmpJyfIeK/model-a0b05229fa6a6.stan', line 48, column 4 to column 24)

Chain 1 Exception: gamma_lpdf: Random variable[1] is inf, but must be positive finite! (in '/tmp/RtmpJyfIeK/model-a0b05229fa6a6.stan', line 46, column 4 to column 22)

Chain 3 Exception: gamma_lpdf: Random variable is inf, but must be positive finite! (in '/tmp/RtmpJyfIeK/model-a0b05229fa6a6.stan', line 47, column 4 to column 22)

All messages are preceded and followed by the following messages, respectively:

Chain X Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:

and

Chain X If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,

Chain X but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.

Questions:

  1. I am fairly new to Stan, and could use some help understanding what’s causing these errors. How can I avoid these messages?
  2. The messages gives me the notion that the code might not optimal. I’d like some input on how to write the code with improved efficiency in mind.
  3. With regards to improved efficiency, I was hoping running the model on the GPU would speedup the computations. In reality, running the model below on the GPU is extremely slow: I.e. I interrupted after >4 mins, as it had not completed the first 500 iterations yet, while the model fit finished on the CPU after only 26 seconds! (In contrast, the model from the OpenCL example vignette on the official pages will run ~3.4 times faster on my GPU.) I am trying to understand whether this unexpected discrepancy is caused by poorly optimized or erroneous code, or if the CPU is simply better suited for fitting the current model.

Stan model:

data {
    int<lower=0> N; // number of observations
    int<lower=1> P; // number of variables
    array[N] vector[P] y; // observed data
}

transformed data {
    vector[2] u = rep_vector(0, 2); // prior mean for xi
    matrix[2, 2] R = diag_matrix(rep_vector(1, 2)); // prior covariance matrix for xi
}

parameters {
    
    vector[P] alp; // intercepts
    array[6] real lam; // loadings
    vector[N] eta; // latent variable
    array[N] vector[2] xi; // latent variables
    // matrix[N, 2] xi; // latent variables
    row_vector[2] gam; // coefficients for nu[i]
    vector<lower=0>[P] psi; // precisions of y
    real<lower=0> psd; // precision of eta
    cov_matrix[2] phi;

}

transformed parameters {
    vector[P] sgm = 1/psi;
    real sgd = 1/psd;
    matrix[2, 2] phx = inverse(phi);   
}

model {
    // Local variables
    array[N] vector[P] mu;
    array[N] vector[P] ephat;
    vector[N] nu;
    vector[N] dthat; 
    
    // Priors on intercepts
    alp ~ normal(0, 1);

    // Priors om precisions
    psi ~ gamma(9, 4);                              // line 46
    psd ~ gamma(9, 4); // precision of eta          // line 47
    phi ~ wishart(5, R);                            // line 48
    
    // Priors on loadings and coefficients
    lam[1] ~ normal(0.8, psi[2]);
    lam[2] ~ normal(0.8, psi[3]);
    lam[3] ~ normal(0.8, psi[5]);
    lam[4] ~ normal(0.8, psi[6]);
    lam[5] ~ normal(0.8, psi[8]);
    lam[6] ~ normal(0.8, psi[9]);
    
    gam ~ normal(0.5, psd);

    // Measurement equation model
    
    for (i in 1:N) {
        
        mu[i,1] = eta[i] + alp[1];
        mu[i,2] = lam[1] * eta[i] + alp[2];
        mu[i,3] = lam[2] * eta[i] + alp[3];
        mu[i,4] = xi[i, 1] + alp[4];
        mu[i,5] = lam[3] * xi[i, 1] + alp[5];
        mu[i,6] = lam[4] * xi[i, 1] + alp[6];
        mu[i,7] = xi[i, 2] + alp[7];
        mu[i,8] = lam[5] * xi[i, 2] + alp[8];
        mu[i,9] = lam[6] * xi[i, 2] + alp[9];

        ephat[i] = y[i] - mu[i];
        y[i] ~ normal(mu[i], psi);
    
    // Structural equation model
        
        nu[i] = gam * xi[i];
    }

    dthat = eta - nu;

    xi ~ multi_normal(u, phi);
    eta ~ normal(nu, psd);
}

The original BUGS model:

model  <- function() {
    for(i in 1:N){
        #measurement equation model
        for(j in 1:P){
            y[i,j] ~ dnorm(mu[i,j],psi[j])
            ephat[i,j] <- y[i,j] - mu[i,j]
        }
        mu[i,1] <- eta[i]+alp[1]
        mu[i,2] <- lam[1] * eta[i] + alp[2]
        mu[i,3] <- lam[2] * eta[i] + alp[3]
        mu[i,4] <- xi[i,1] + alp[4]
        mu[i,5] <- lam[3] * xi[i,1] + alp[5]
        mu[i,6] <- lam[4] * xi[i,1] + alp[6]
        mu[i,7] <- xi[i,2] + alp[7]
        mu[i,8] <- lam[5] * xi[i,2] + alp[8]
        mu[i,9] <- lam[6] * xi[i,2] + alp[9]

        #structural equation model
        xi[i,1:2] ~ dmnorm(u[1:2], phi[1:2,1:2])
        eta[i] ~ dnorm(nu[i], psd)
        nu[i] <- gam[1] * xi[i,1] + gam[2] * xi[i,2]
        dthat[i] <- eta[i] - nu[i]
    } #end of i

    #priors on intercepts
    for(j in 1:9){alp[j]~dnorm(0.0, 1.0)}

    #priors on loadings and coefficients
    lam[1] ~ dnorm(0.8, psi[2])
    lam[2] ~ dnorm(0.8, psi[3])
    lam[3] ~ dnorm(0.8, psi[5])
    lam[4] ~ dnorm(0.8, psi[6])
    lam[5] ~ dnorm(0.8, psi[8])
    lam[6] ~ dnorm(0.8, psi[9])
    for(j in 1:2){ gam[j] ~ dnorm(0.5, psd) }
    
    #priors on precisions
    for(j in 1:P){
        psi[j] ~ dgamma(9.0, 4.0)
        sgm[j] <- 1/psi[j]
    }
    psd ~ dgamma(9.0, 4.0)
    sgd <- 1/psd
    phi[1:2,1:2] ~ dwish(R[1:2,1:2], 5)
    phx[1:2,1:2] <- inverse(phi[1:2, 1:2])

} #end of model

LISREL model to be estimated:

(From: Lee, S. Y. (2007). Structural equation modeling: A Bayesian approach`,. ch. 4.5, pp. 99 & 101. John Wiley & Sons.)



System info

R: 4.3.1
CmdStan: 2.33.0
cmdstanr: 0.6.1

OS: Arch Linux x86_64 
CPU: AMD Ryzen 7 5800H with Radeon Graphics (16) @ 4.463GHz [45.8°C] 
GPU: NVIDIA GeForce RTX 3070 Mobile / Max-Q 
Memory: 13.87GiB / 31.20GiB 
Kernel: Linux 6.5.3-arch1-1 
GPU Driver: NVIDIA 535.104.05 

I didn’t immediately see a problem with your Stan model. I sometimes see those messages near the start of a chain, when I don’t supply initial values for the parameters. If you only see the messages during warmup, then there may not be anything to worry about (as suggested by the last message that you pasted).

The major way to improve efficiency here is to integrate the xi and eta latent variables out of the model likelihood. It is easiest to express this integrated (“marginal”) likelihood using matrices. This is what the blavaan package does, some details are available in this paper: Efficient Bayesian Structural Equation Modeling in Stan | Journal of Statistical Software