I am currently trying to translate and fit a SEM using `cmdstanr`

(see below), based on a model originally written for WinBUGS (code at the bottom of page). The current model will run to completion, however, I get the following exceptions returned in the output, repeatedly, when fitting the model (lines causing exceptions are marked with comments in the code sample below):

```
Chain 1 Exception: wishart_lpdf: random variable is not symmetric. random variable[1,2] = -inf, but random variable[2,1] = -inf (in '/tmp/RtmpJyfIeK/model-a0b05229fa6a6.stan', line 48, column 4 to column 24)
Chain 1 Exception: gamma_lpdf: Random variable[1] is inf, but must be positive finite! (in '/tmp/RtmpJyfIeK/model-a0b05229fa6a6.stan', line 46, column 4 to column 22)
Chain 3 Exception: gamma_lpdf: Random variable is inf, but must be positive finite! (in '/tmp/RtmpJyfIeK/model-a0b05229fa6a6.stan', line 47, column 4 to column 22)
```

All messages are preceded and followed by the following messages, respectively:

```
Chain X Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
```

and

```
Chain X If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
Chain X but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
```

#### Questions:

- I am fairly new to Stan, and could use some help understanding what’s causing these errors. How can I avoid these messages?
- The messages gives me the notion that the code might not optimal. I’d like some input on how to write the code with improved efficiency in mind.
- With regards to improved efficiency, I was hoping running the model on the GPU would speedup the computations. In reality, running the model below on the GPU is extremely slow: I.e. I interrupted after >4 mins, as it had not completed the first 500 iterations yet, while the model fit finished on the CPU after only 26 seconds! (In contrast, the model from the OpenCL example vignette on the official pages will run ~3.4 times faster on my GPU.) I am trying to understand whether this unexpected discrepancy is caused by poorly optimized or erroneous code, or if the CPU is simply better suited for fitting the current model.

#### Stan model:

```
data {
int<lower=0> N; // number of observations
int<lower=1> P; // number of variables
array[N] vector[P] y; // observed data
}
transformed data {
vector[2] u = rep_vector(0, 2); // prior mean for xi
matrix[2, 2] R = diag_matrix(rep_vector(1, 2)); // prior covariance matrix for xi
}
parameters {
vector[P] alp; // intercepts
array[6] real lam; // loadings
vector[N] eta; // latent variable
array[N] vector[2] xi; // latent variables
// matrix[N, 2] xi; // latent variables
row_vector[2] gam; // coefficients for nu[i]
vector<lower=0>[P] psi; // precisions of y
real<lower=0> psd; // precision of eta
cov_matrix[2] phi;
}
transformed parameters {
vector[P] sgm = 1/psi;
real sgd = 1/psd;
matrix[2, 2] phx = inverse(phi);
}
model {
// Local variables
array[N] vector[P] mu;
array[N] vector[P] ephat;
vector[N] nu;
vector[N] dthat;
// Priors on intercepts
alp ~ normal(0, 1);
// Priors om precisions
psi ~ gamma(9, 4); // line 46
psd ~ gamma(9, 4); // precision of eta // line 47
phi ~ wishart(5, R); // line 48
// Priors on loadings and coefficients
lam[1] ~ normal(0.8, psi[2]);
lam[2] ~ normal(0.8, psi[3]);
lam[3] ~ normal(0.8, psi[5]);
lam[4] ~ normal(0.8, psi[6]);
lam[5] ~ normal(0.8, psi[8]);
lam[6] ~ normal(0.8, psi[9]);
gam ~ normal(0.5, psd);
// Measurement equation model
for (i in 1:N) {
mu[i,1] = eta[i] + alp[1];
mu[i,2] = lam[1] * eta[i] + alp[2];
mu[i,3] = lam[2] * eta[i] + alp[3];
mu[i,4] = xi[i, 1] + alp[4];
mu[i,5] = lam[3] * xi[i, 1] + alp[5];
mu[i,6] = lam[4] * xi[i, 1] + alp[6];
mu[i,7] = xi[i, 2] + alp[7];
mu[i,8] = lam[5] * xi[i, 2] + alp[8];
mu[i,9] = lam[6] * xi[i, 2] + alp[9];
ephat[i] = y[i] - mu[i];
y[i] ~ normal(mu[i], psi);
// Structural equation model
nu[i] = gam * xi[i];
}
dthat = eta - nu;
xi ~ multi_normal(u, phi);
eta ~ normal(nu, psd);
}
```

#### The original BUGS model:

```
model <- function() {
for(i in 1:N){
#measurement equation model
for(j in 1:P){
y[i,j] ~ dnorm(mu[i,j],psi[j])
ephat[i,j] <- y[i,j] - mu[i,j]
}
mu[i,1] <- eta[i]+alp[1]
mu[i,2] <- lam[1] * eta[i] + alp[2]
mu[i,3] <- lam[2] * eta[i] + alp[3]
mu[i,4] <- xi[i,1] + alp[4]
mu[i,5] <- lam[3] * xi[i,1] + alp[5]
mu[i,6] <- lam[4] * xi[i,1] + alp[6]
mu[i,7] <- xi[i,2] + alp[7]
mu[i,8] <- lam[5] * xi[i,2] + alp[8]
mu[i,9] <- lam[6] * xi[i,2] + alp[9]
#structural equation model
xi[i,1:2] ~ dmnorm(u[1:2], phi[1:2,1:2])
eta[i] ~ dnorm(nu[i], psd)
nu[i] <- gam[1] * xi[i,1] + gam[2] * xi[i,2]
dthat[i] <- eta[i] - nu[i]
} #end of i
#priors on intercepts
for(j in 1:9){alp[j]~dnorm(0.0, 1.0)}
#priors on loadings and coefficients
lam[1] ~ dnorm(0.8, psi[2])
lam[2] ~ dnorm(0.8, psi[3])
lam[3] ~ dnorm(0.8, psi[5])
lam[4] ~ dnorm(0.8, psi[6])
lam[5] ~ dnorm(0.8, psi[8])
lam[6] ~ dnorm(0.8, psi[9])
for(j in 1:2){ gam[j] ~ dnorm(0.5, psd) }
#priors on precisions
for(j in 1:P){
psi[j] ~ dgamma(9.0, 4.0)
sgm[j] <- 1/psi[j]
}
psd ~ dgamma(9.0, 4.0)
sgd <- 1/psd
phi[1:2,1:2] ~ dwish(R[1:2,1:2], 5)
phx[1:2,1:2] <- inverse(phi[1:2, 1:2])
} #end of model
```

#### LISREL model to be estimated:

(From: Lee, S. Y. (2007). *Structural equation modeling: A Bayesian approach*`,. ch. 4.5, pp. 99 & 101. John Wiley & Sons.)

#### System info

```
R: 4.3.1
CmdStan: 2.33.0
cmdstanr: 0.6.1
OS: Arch Linux x86_64
CPU: AMD Ryzen 7 5800H with Radeon Graphics (16) @ 4.463GHz [45.8°C]
GPU: NVIDIA GeForce RTX 3070 Mobile / Max-Q
Memory: 13.87GiB / 31.20GiB
Kernel: Linux 6.5.3-arch1-1
GPU Driver: NVIDIA 535.104.05
```