Improving Hierarchical Model Efficiency

Hi everyone!

I started to dig deeper into hierarchical models with STAN and got to a version of my model which successfully runs. However, on the full dataset, (200 binary decisions per subject; 300 subjects), the model takes around 15 hours to run.

My solution is a combination of searching similar problems and trial-and-error, so I’m sure there is something to optimize here. Im glad for any feedback! Thank you!

data {
  int<lower=0> N;//number of observations
  int<lower=1> n_subject;//number of subjects
  int<lower=1,upper=n_subject> subject_id[N];//vector of subject indeces

  int<lower=1,upper=2> n_conditions;//number of conditions
  int<lower=1,upper=n_conditions> condition_id[N];//vector of condition indeces

  array[N] int y;
  vector[N] x;
}
parameters {
  vector<lower=0, upper=5>[n_subject] nu;
  vector<lower=0, upper=5>[n_subject] sigma_e;
  vector<lower=0, upper=5>[n_subject] sigma_o;
  vector<lower=0, upper=0.5>[n_subject] soc_pref;

  vector<lower=0>[n_conditions] nu_mu;
  vector<lower=0>[n_conditions] nu_sigma;

  real<lower=0> sigma_e_mu;
  real<lower=0> sigma_e_sigma;

  real<lower=0> sigma_o_mu;
  real<lower=0> sigma_o_sigma;

  real<lower=0> soc_pref_mu;
  real<lower=0> soc_pref_sigma;
}

transformed parameters{

  vector[N] ratio = soc_pref[subject_id] ./ (1 - soc_pref[subject_id]);
  vector[N] gamma = (pow(sigma_e[subject_id],2)) ./(pow(sigma_e[subject_id],2) + pow(nu[subject_id],2));
  vector[N] alpha = (pow(sigma_o[subject_id],2)) ./(pow(sigma_o[subject_id],2) + pow(nu[subject_id],2));
}


model {
  //priors

  for (c in 1:n_conditions){
    nu_mu[c] ~ normal(0, 0.1);
    nu_sigma[c] ~ normal(0, 0.1);
  }

  sigma_e_mu ~ normal(0, 0.1);
  sigma_e_sigma ~ normal(0, 0.1);

  sigma_o_mu ~ normal(0, 0.1);
  sigma_o_sigma ~ normal(0, 0.1);

  soc_pref_mu ~ normal(0, 0.1);
  soc_pref_sigma ~ normal(0, 0.1);

  for (n in 1:N) {
    nu[subject_id[n]] ~ normal(nu_mu[condition_id[n]], nu_sigma[condition_id[n]]);
  }

  sigma_e ~ normal(sigma_e_mu, sigma_e_sigma);
  sigma_o ~ normal(sigma_o_mu, sigma_o_sigma);
  soc_pref ~ normal(soc_pref_mu, soc_pref_sigma);


  vector[N] projection = ((alpha[subject_id] .* log(x)) - (gamma[subject_id] .* log(ratio))) ./ (nu[subject_id] .* sqrt(pow(gamma[subject_id],2) + pow(alpha[subject_id],2)));
  y ~ bernoulli(Phi(projection));
}

Without going into the nitty gritty (or maybe going into the model at all), I’m assuming you are modeling each subject with their own parameter and (partially-)pooling them using a group-level distribution.

If you are getting started with that kind of model one didactic way (in my opinion) of building up to that is implementing an individual subject, then a hierarchical model with two subjects, and then scale up, it will probably give you good sense of model (in)efficiency and whether the individual subject performance is doing as expected in the hierarchical version.

I’d could also suggest that yo don’t constrain the upper bound of the parameters in the first runs so you get a better view of the posterior. Finally, there’s an faster approximation for the probit model that may also be helpful.

2 Likes

I think something is off with the indexing in the transformed parameters. The model has one ratio, gamma and alpha for each observation and it probably needs to be one per subject.

transformed parameters{

  vector[n_subject] ratio = soc_pref ./ (1 - soc_pref);
  vector[n_subject] gamma = (pow(sigma_e,2)) ./(pow(sigma_e,2) + pow(nu,2));
  vector[n_subject] alpha = (pow(sigma_o,2)) ./(pow(sigma_o,2) + pow(nu,2));
}

So that in this line

vector[N] projection = ((alpha[subject_id] .* log(x)) - (gamma[subject_id] .* log(ratio))) ./ (nu[subject_id] .* sqrt(pow(gamma[subject_id],2) + pow(alpha[subject_id],2)));

Stan picks out the alpha, gamma, and ratio for the subject i and not for observation i.

I am also not sure what the goal is of

  for (n in 1:N) {
    nu[subject_id[n]] ~ normal(nu_mu[condition_id[n]], nu_sigma[condition_id[n]]);
  }

It looks to me that you are putting 200 (#observations per subject) priors on nu[i] where i is a subject.

These might be two things that you actually want so my apologies if the comments are off the mark. I just mention them because I think they will not lead to errors or warnings but in the most common experimental setups and with a simple hierarchical structure they are not what I would expect. The issues also create a lot of extra parameters, 200*300 instead of 300, which might cause the slow runs.

Hi! Thank you for the feedback, I’ll implement the Phi_approx! Thanks!
N

Hei! .No need to apologize! I just started learning how to use STAN, so I’m grateful for any feedback. So thank you!

About your comments:

  1. You’re absolutely right; I want ratio, gamma and alpha to be one per subject, not one per observation. So thanks for spotting this mistake.

  2. The idea behind nu[subject_id[n]] is that I want one prior for each subject (so in total 300 priors), but -importantly- the means differ between the conditions. So ,the fix here would be to do

for j in 1:n_subject {
nu[j] ~ normal(nu_mu[condition_id[j]], nu_sigma[condition_id[j]]);
}

Right? Or do I miss something ?

Thanks again!
N

For 2. Is this a between subject design with each subject only in one condition?

Exactly

Then I think your approach works if you change the condition_id in the data section so that it is vector with one observation per subject.

int<lower=1,upper=n_conditions> condition_id[n_subject]

And if you do that than you can simplify your approach to

nu ~ normal(nu_mu[condition_id], nu_sigma[condition_id]);

There might be typos in my code but I hope I get the idea across