How could I improve this model?

aloaberasturi · March 20, 2021, 12:21am

Hi everybody!

I have been working for less than a month with Stan and Bayesian workflow in general, I apologize if I am asking something absurd or if my model is ill-posed. I am trying to build a model to infer the grades s that a teacher would give to S different students. Some of these grades s are known and some are not (a.k.a partially known parameters).
I attach a figure of the model as a graph together with the code I have so far.
The equations of my model are as follows:

s_{j} - z_{j}^{i} \sim N(\mu_{\varepsilon i}, \sigma_{\varepsilon i}) \forall j
b_{i} \sim N(\mu_{\varepsilon i}, \sigma_{\varepsilon i})

\beta_{k} variables are sampled from a mixture of gaussians, where the mixing probabilities are the inverse absolute value of b_{i} (after being normalized):

\beta_{k} \sim \sum_{i} \frac{1}{|\tilde{b}_{i}|} N(\mu_{ik} + \mu_{\varepsilon i}, \sigma_{ik})
Where \tilde{b}_{i} = \frac{b_{i}}{\sum_{k \in \{ik\}}b_{k}}, \{ik\} being all the pairs of subindices for data variables \mu_{ik} and \sigma_{ik}

The variables that I input as data are:

z_{j}^{i}
some s_{j}
\mu_{i k}
\sigma_{i k}

Finally, the correspondence of the names of the variables in my code and the variables above is:

z_{j}^{i} : z[]
s_{j} : s[]
\mu_{ik} : mu_data[]
\sigma_{ik} : sigma_data[]
\mu_{\varepsilon i} : mu[]
\sigma_{\varepsilon i } : sigma[]
b_{i} : b[]
\beta_{k} : beta[]

I am aware that my code is not clean, and I wonder if improving it would boost the sampler. Right now it shows the message ‘Gradient evaluation took 0.000277 seconds
1000 transitions using 10 leapfrog steps per transition would take 2.77 seconds.
Adjust your expectations accordingly!’. As a side note, the only optimizer that won’t run into
a runtime error is Newton.

Could someone please share a hint on how to better code my model?

I really appreciate any help you can provide.

    data{
        int<lower=1> N; // number of  z 
        int<lower=0> N_pairs; // number of observations of mu_data and sigma_data
        int<lower=0> Ns_miss; // number of missing s
        int<lower=0> Ns_obs; // number of observed s
        int ii_obs_p[Ns_obs]; // location of observed s
        int ii_miss_p[Ns_miss]; // location of non-observed s
        int<lower=1> g1_mu[N_pairs]; // first subindex of mu_data and sigma_data
        int<lower=1> g2_mu[N_pairs]; // second subindex of mu_data and sigma_data
        int<lower=1> uu[N]; // index for gradee of each observed grade
        int<lower=1> vv[N]; // index for grader of each observed grade
        real<lower=0> z[N];  // z
        real<lower=0> s_obs[Ns_obs]; // observed s
        real<lower=-10,upper=10> mu_data[N_pairs]; 
        real<lower=0> sigma_data[N_pairs]; 
    }

    transformed data{
        int<lower=1> S = Ns_miss + Ns_obs;
    }
    parameters{
        real s_miss_p[Ns_miss];
        real b[S]; 
        real beta[S];
        real mu[S]; // 
        real<lower=0> sigma[S]; 
    }

    transformed parameters{
        vector[S] norm_constant = rep_vector(0, S); // normalize b
        real s[S];

        if (Ns_obs > 0)
          s[ii_obs_p] = s_obs;
        s[ii_miss_p] = s_miss_p;

        for (m in 1:N_pairs){
            int i = g1_mu[m];
            int k = g2_mu[m];
            norm_constant[k] += fabs(b[i]);
        }
    }

    model {
        matrix[S,S] contributions;
        real log_sum = 0.0;
        mu ~ normal(0, 1.);
        sigma ~ inv_gamma(.01, .01);    

        for (n in 1:N){
            int i = vv[n];
            int j = uu[n];
            target += normal_lpdf(s[j] - z[n]| mu[i], sigma[i]);
        }

        b ~ normal(mu, sigma);

        for (n in 1:N_pairs){
            int k = g2_mu[n];
            int i = g1_mu[n];
            if (fabs(b[i]) != 0.0)
                contributions[k] += log( norm_constant[k] / fabs(b[i])) + normal_lpdf(beta[k] | mu_data[n] + mu[i], sigma_data[n]);
        }
        target += log_sum_exp(contributions);        
        }
    }

Ara_Winter · March 22, 2021, 2:27pm

I can’t help with this specific model but I can offer some approaches to figure out what is going on.

There is a lot going on in this model and it will be difficult to troubleshoot. I would recommend coding up the simplest partial known parameters model first. Then using simulated data with known parameters see if the model returns those parameters and generally runs ok. Once that is all sorted, I would start to complicate the model as needed but in stages, checking each time to make sure the parameters are recovered.

caesoma · March 22, 2021, 6:32pm

Also may not be very specific to your model, if you started working with Stan recently some general pointers may be helpful.

Before trying to fix anything, do you think it actually needs improving? Are you getting sensible results, or able to reproduce estimates from other inference approaches you have used before (or as @Ara_Winter mentioned, can you recover parameter values from simulated data)? If it’s working in general improving may be a matter of efficiency.

About efficiency, often in HMC the most costly step is the gradient calculation (done via auto-differentiation), so small improvements to the Stan code tend to be negligible. One exception is when analytical gradients are available for certain models and parameterizations (e.g. using multi_normal_cholesky in some models may be better than computing the Cholesky decomposition to obtain the actual covariance matrix and compute the likelihood from that).
I don’t keep track of all models I use where analytical gradients or more efficient implementations are available, so my rule of thumb (again) is if it’s working and it’s not too slow, I don’t try to fix it.

Finally, is there a reason you are using optimization instead of MCMC sampling? Does the model into trouble that way as well?

Topic		Replies	Views
Deterministic and random variables in Stan Modeling mixed-model	3	884	March 19, 2021
Bayesian Mixture Model Modeling pystan , rstan , fitting-issues , stan , mixture	2	560	December 29, 2022
Updating Gaussian multiple regression code and building a robust model Modeling rstan	4	425	May 12, 2021
Help with Bayesian Modelling Modeling rstan , prior-choice , priors , initialization	6	262	July 4, 2024
Beta binomial mixture model Modeling mixture	5	1264	March 4, 2021

How could I improve this model?

Related topics