Hey friends! I’m working on a probabilistic model of human attention. The model is defined by the following problem:
We want to learn the \mu and \sigma of Gaussian random variable y via Bayesian inference. We have some prior beliefs about P(\mu) ~ (normal dist) and P(\sigma) ~ (gamma dist), which we want to update given observations. However, we cannot observe y directly. Instead, we can only take noisy samples z ~ N( y, noise), where noise itself has a gamma prior.
Stan Model
data {
int<lower=1> F; // number of features
int<lower=1> M; // total number of noisy samples
int<lower=1> K; // number of y's (exemplars)
matrix[F, M] z; // noisy samples (rows are features, columns are samples)
int<lower=1> exemplar_idx[M]; // list of indices of size M
// hyper priors
real mu_mean;
real<lower=0> mu_sd;
real<lower=0> sigma_alpha;
real<lower=0> sigma_beta;
real<lower=0> noise_alpha; // check what priors to provide
real<lower=0> noise_beta;
}
parameters {
vector[F] mu;
vector<lower=0>[F] sigma;
matrix[F, K] y;
real<lower=0> noise;
}
model {
noise ~ gamma(noise_alpha, noise_beta);
// loop through features
for (f in 1:F){
mu[f] ~ normal(mu_mean, mu_sd);
sigma[f] ~ gamma(sigma_alpha,sigma_beta);
// loop through y's
for (k in 1:K){
y[f, k] ~ normal(mu[f], sigma[f]);
}
// multiple z observations
for (m in 1:M){
z[f, m] ~ normal(y[f, exemplar_idx[m]], noise);
}
}
}
generated quantities {
vector[F] z_rep;
for (f in 1:F){
z_rep[f] = y[f, K] + normal_rng(0, noise);
}
}
And an example data dictionary (where z is the observation):
data =
{'mu_mean': 0,
'mu_sd': 0.5,
'sigma_alpha': 2,
'sigma_beta': 2,
'epsilon_alpha': 1,
'epsilon_beta': 1,
'noise': 0.4,
'F': 1,
'noise_alpha': 7.5,
'noise_beta': 1,
'M': 1,
'K': 1,
'z': array([[1.04543573]]),
'exemplar_idx': [1]}
I’m using MCMC in CmdStanPy to get samples from the approximate mu and sigma posteriors. I’m getting many divergent transitions (dozens per chain) but I’m not sure what needs to be adjusted. Is there an issue with how the stan model is defined / parameter settings or do I need to try a different implementation of MCMC? Thank you!