MCMC approximation of posterior for a simple model of human attention (Divergent transitions)

Hey friends! I’m working on a probabilistic model of human attention. The model is defined by the following problem:

We want to learn the \mu and \sigma of Gaussian random variable y via Bayesian inference. We have some prior beliefs about P(\mu) ~ (normal dist) and P(\sigma) ~ (gamma dist), which we want to update given observations. However, we cannot observe y directly. Instead, we can only take noisy samples z ~ N( y, noise), where noise itself has a gamma prior.

Stan Model

data {
    int<lower=1> F; // number of features
    int<lower=1> M; // total number of noisy samples
    int<lower=1> K; // number of y's (exemplars)
    matrix[F, M] z; // noisy samples (rows are features, columns are samples)

    int<lower=1> exemplar_idx[M]; // list of indices of size M

    // hyper priors
    real mu_mean;
    real<lower=0> mu_sd;

    real<lower=0> sigma_alpha;
    real<lower=0> sigma_beta;

    real<lower=0> noise_alpha; // check what priors to provide
    real<lower=0> noise_beta;

parameters {
    vector[F] mu;
    vector<lower=0>[F] sigma;
    matrix[F, K] y;
    real<lower=0> noise;

model {

    noise ~ gamma(noise_alpha, noise_beta);

    // loop through features
    for (f in 1:F){

        mu[f] ~ normal(mu_mean, mu_sd);
        sigma[f] ~ gamma(sigma_alpha,sigma_beta);

        // loop through y's
        for (k in 1:K){
            y[f, k] ~ normal(mu[f], sigma[f]);

        // multiple z observations
        for (m in 1:M){
            z[f, m] ~ normal(y[f, exemplar_idx[m]], noise);
generated quantities {

vector[F] z_rep;

for (f in 1:F){
        z_rep[f] = y[f, K] + normal_rng(0, noise);


And an example data dictionary (where z is the observation):

data = 
{'mu_mean': 0,
 'mu_sd': 0.5,
 'sigma_alpha': 2,
 'sigma_beta': 2,
 'epsilon_alpha': 1,
 'epsilon_beta': 1,
 'noise': 0.4,
 'F': 1,
 'noise_alpha': 7.5,
 'noise_beta': 1,
 'M': 1,
 'K': 1,
 'z': array([[1.04543573]]),
 'exemplar_idx': [1]}

I’m using MCMC in CmdStanPy to get samples from the approximate mu and sigma posteriors. I’m getting many divergent transitions (dozens per chain) but I’m not sure what needs to be adjusted. Is there an issue with how the stan model is defined / parameter settings or do I need to try a different implementation of MCMC? Thank you!