How could I improve this model?

Hi everybody!

I have been working for less than a month with Stan and Bayesian workflow in general, I apologize if I am asking something absurd or if my model is ill-posed. I am trying to build a model to infer the grades s that a teacher would give to S different students. Some of these grades s are known and some are not (a.k.a partially known parameters).
I attach a figure of the model as a graph together with the code I have so far.
The equations of my model are as follows:

s_{j} - z_{j}^{i} \sim N(\mu_{\varepsilon i}, \sigma_{\varepsilon i}) \forall j
b_{i} \sim N(\mu_{\varepsilon i}, \sigma_{\varepsilon i})

\beta_{k} variables are sampled from a mixture of gaussians, where the mixing probabilities are the inverse absolute value of b_{i} (after being normalized):

\beta_{k} \sim \sum_{i} \frac{1}{|\tilde{b}_{i}|} N(\mu_{ik} + \mu_{\varepsilon i}, \sigma_{ik})
Where \tilde{b}_{i} = \frac{b_{i}}{\sum_{k \in \{ik\}}b_{k}}, \{ik\} being all the pairs of subindices for data variables \mu_{ik} and \sigma_{ik}

The variables that I input as data are:

  • z_{j}^{i}
  • some s_{j}
  • \mu_{i k}
  • \sigma_{i k}

Finally, the correspondence of the names of the variables in my code and the variables above is:

  • z_{j}^{i} : z[]
  • s_{j} : s[]
  • \mu_{ik} : mu_data[]
  • \sigma_{ik} : sigma_data[]
  • \mu_{\varepsilon i} : mu[]
  • \sigma_{\varepsilon i } : sigma[]
  • b_{i} : b[]
  • \beta_{k} : beta[]

I am aware that my code is not clean, and I wonder if improving it would boost the sampler. Right now it shows the message ‘Gradient evaluation took 0.000277 seconds
1000 transitions using 10 leapfrog steps per transition would take 2.77 seconds.
Adjust your expectations accordingly!
’. As a side note, the only optimizer that won’t run into
a runtime error is Newton.

Could someone please share a hint on how to better code my model?

I really appreciate any help you can provide.

        int<lower=1> N; // number of  z 
        int<lower=0> N_pairs; // number of observations of mu_data and sigma_data
        int<lower=0> Ns_miss; // number of missing s
        int<lower=0> Ns_obs; // number of observed s
        int ii_obs_p[Ns_obs]; // location of observed s
        int ii_miss_p[Ns_miss]; // location of non-observed s
        int<lower=1> g1_mu[N_pairs]; // first subindex of mu_data and sigma_data
        int<lower=1> g2_mu[N_pairs]; // second subindex of mu_data and sigma_data
        int<lower=1> uu[N]; // index for gradee of each observed grade
        int<lower=1> vv[N]; // index for grader of each observed grade
        real<lower=0> z[N];  // z
        real<lower=0> s_obs[Ns_obs]; // observed s
        real<lower=-10,upper=10> mu_data[N_pairs]; 
        real<lower=0> sigma_data[N_pairs]; 

    transformed data{
        int<lower=1> S = Ns_miss + Ns_obs;
        real s_miss_p[Ns_miss];
        real b[S]; 
        real beta[S];
        real mu[S]; // 
        real<lower=0> sigma[S]; 

    transformed parameters{
        vector[S] norm_constant = rep_vector(0, S); // normalize b
        real s[S];

        if (Ns_obs > 0)
          s[ii_obs_p] = s_obs;
        s[ii_miss_p] = s_miss_p;

        for (m in 1:N_pairs){
            int i = g1_mu[m];
            int k = g2_mu[m];
            norm_constant[k] += fabs(b[i]);

    model {
        matrix[S,S] contributions;
        real log_sum = 0.0;
        mu ~ normal(0, 1.);
        sigma ~ inv_gamma(.01, .01);    

        for (n in 1:N){
            int i = vv[n];
            int j = uu[n];
            target += normal_lpdf(s[j] - z[n]| mu[i], sigma[i]);

        b ~ normal(mu, sigma);

        for (n in 1:N_pairs){
            int k = g2_mu[n];
            int i = g1_mu[n];
            if (fabs(b[i]) != 0.0)
                contributions[k] += log( norm_constant[k] / fabs(b[i])) + normal_lpdf(beta[k] | mu_data[n] + mu[i], sigma_data[n]);
        target += log_sum_exp(contributions);        


I can’t help with this specific model but I can offer some approaches to figure out what is going on.

There is a lot going on in this model and it will be difficult to troubleshoot. I would recommend coding up the simplest partial known parameters model first. Then using simulated data with known parameters see if the model returns those parameters and generally runs ok. Once that is all sorted, I would start to complicate the model as needed but in stages, checking each time to make sure the parameters are recovered.

1 Like

Also may not be very specific to your model, if you started working with Stan recently some general pointers may be helpful.

Before trying to fix anything, do you think it actually needs improving? Are you getting sensible results, or able to reproduce estimates from other inference approaches you have used before (or as @Ara_Winter mentioned, can you recover parameter values from simulated data)? If it’s working in general improving may be a matter of efficiency.

About efficiency, often in HMC the most costly step is the gradient calculation (done via auto-differentiation), so small improvements to the Stan code tend to be negligible. One exception is when analytical gradients are available for certain models and parameterizations (e.g. using multi_normal_cholesky in some models may be better than computing the Cholesky decomposition to obtain the actual covariance matrix and compute the likelihood from that).
I don’t keep track of all models I use where analytical gradients or more efficient implementations are available, so my rule of thumb (again) is if it’s working and it’s not too slow, I don’t try to fix it.

Finally, is there a reason you are using optimization instead of MCMC sampling? Does the model into trouble that way as well?

1 Like