Hi everybody!
I have been working for less than a month with Stan and Bayesian workflow in general, I apologize if I am asking something absurd or if my model is ill-posed. I am trying to build a model to infer the grades s that a teacher would give to S different students. Some of these grades s are known and some are not (a.k.a partially known parameters).
I attach a figure of the model as a graph together with the code I have so far.
The equations of my model are as follows:
s_{j} - z_{j}^{i} \sim N(\mu_{\varepsilon i}, \sigma_{\varepsilon i}) \forall j
b_{i} \sim N(\mu_{\varepsilon i}, \sigma_{\varepsilon i})
\beta_{k} variables are sampled from a mixture of gaussians, where the mixing probabilities are the inverse absolute value of b_{i} (after being normalized):
\beta_{k} \sim \sum_{i} \frac{1}{|\tilde{b}_{i}|} N(\mu_{ik} + \mu_{\varepsilon i}, \sigma_{ik})
Where \tilde{b}_{i} = \frac{b_{i}}{\sum_{k \in \{ik\}}b_{k}}, \{ik\} being all the pairs of subindices for data variables \mu_{ik} and \sigma_{ik}
The variables that I input as data are:
- z_{j}^{i}
- some s_{j}
- \mu_{i k}
- \sigma_{i k}
Finally, the correspondence of the names of the variables in my code and the variables above is:
-
z_{j}^{i} :
z[]
-
s_{j} :
s[]
-
\mu_{ik} :
mu_data[]
-
\sigma_{ik} :
sigma_data[]
-
\mu_{\varepsilon i} :
mu[]
-
\sigma_{\varepsilon i } :
sigma[]
-
b_{i} :
b[]
-
\beta_{k} :
beta[]
I am aware that my code is not clean, and I wonder if improving it would boost the sampler. Right now it shows the message ‘Gradient evaluation took 0.000277 seconds
1000 transitions using 10 leapfrog steps per transition would take 2.77 seconds.
Adjust your expectations accordingly!’. As a side note, the only optimizer that won’t run into
a runtime error is Newton.
Could someone please share a hint on how to better code my model?
I really appreciate any help you can provide.
data{
int<lower=1> N; // number of z
int<lower=0> N_pairs; // number of observations of mu_data and sigma_data
int<lower=0> Ns_miss; // number of missing s
int<lower=0> Ns_obs; // number of observed s
int ii_obs_p[Ns_obs]; // location of observed s
int ii_miss_p[Ns_miss]; // location of non-observed s
int<lower=1> g1_mu[N_pairs]; // first subindex of mu_data and sigma_data
int<lower=1> g2_mu[N_pairs]; // second subindex of mu_data and sigma_data
int<lower=1> uu[N]; // index for gradee of each observed grade
int<lower=1> vv[N]; // index for grader of each observed grade
real<lower=0> z[N]; // z
real<lower=0> s_obs[Ns_obs]; // observed s
real<lower=-10,upper=10> mu_data[N_pairs];
real<lower=0> sigma_data[N_pairs];
}
transformed data{
int<lower=1> S = Ns_miss + Ns_obs;
}
parameters{
real s_miss_p[Ns_miss];
real b[S];
real beta[S];
real mu[S]; //
real<lower=0> sigma[S];
}
transformed parameters{
vector[S] norm_constant = rep_vector(0, S); // normalize b
real s[S];
if (Ns_obs > 0)
s[ii_obs_p] = s_obs;
s[ii_miss_p] = s_miss_p;
for (m in 1:N_pairs){
int i = g1_mu[m];
int k = g2_mu[m];
norm_constant[k] += fabs(b[i]);
}
}
model {
matrix[S,S] contributions;
real log_sum = 0.0;
mu ~ normal(0, 1.);
sigma ~ inv_gamma(.01, .01);
for (n in 1:N){
int i = vv[n];
int j = uu[n];
target += normal_lpdf(s[j] - z[n]| mu[i], sigma[i]);
}
b ~ normal(mu, sigma);
for (n in 1:N_pairs){
int k = g2_mu[n];
int i = g1_mu[n];
if (fabs(b[i]) != 0.0)
contributions[k] += log( norm_constant[k] / fabs(b[i])) + normal_lpdf(beta[k] | mu_data[n] + mu[i], sigma_data[n]);
}
target += log_sum_exp(contributions);
}
}