Hierarchical Model - behavioural data: should param standard deviation priors be group level or inside per subject loop?


I’m a physics postdoc; very grateful to be doing a side project on behavioral data modeling as part of a project. I have a question about hierarchical Bayesian modelling in stan- here’s the code my PI has given me (Rescorla-Wagner):

```// Defining the data that is inputted into the model
data {
  int<lower=1> nSubjects; // number of subjects
  int<lower=1> nTrials; // number of trials  
  int<lower=0,upper=4> choice[nSubjects, nTrials]; // choices made by subjects in trials    
  real<lower=0, upper=100> reward[nSubjects, nTrials]; // reward for each subject in trials 
}

// Defining transformed data
transformed data {
  real<lower=0, upper=100> v1; // prior belief mean reward value trial 1
  
  v1 = 50.0;
}

// Defining parameters of the model
parameters {
  real<lower=0,upper=1> alpha_mu; // mean of learning rate
  real<lower=0,upper=3> beta_mu; // mean of inverse temperature

  real<lower=0> alpha_sd; // standard deviation of learning rate
  real<lower=0> beta_sd; // standard deviation of inverse temperature
  
  real<lower=0,upper=1> alpha[nSubjects]; // learning rate for each subject
  real<lower=0,upper=3> beta[nSubjects]; // inverse temperature for each subject
}

// Model specifications
model {

  for (s in 1:nSubjects) {
    vector[4] v; // value (mu)
    real pe; // prediction error

    v = rep_vector(v1, 4); // initialize value vector with prior belief mean reward
  
    // Prior distributions for hyperparameters
    alpha_sd ~ cauchy(0,1);
    beta_sd ~ cauchy(0,1);
  
    // Prior distributions for parameters
    alpha[s] ~ normal(alpha_mu, alpha_sd);  
    beta[s] ~ normal(beta_mu, beta_sd);        

    for (t in 1:nTrials) {
      if (choice[s,t] != 0) {
        // Calculate action probabilities
        choice[s,t] ~ categorical_logit(beta[s] * (v));
  
        // Calculate prediction error 
        pe = reward[s,t] - v[choice[s,t]];
  
        // Value/mu updating (learning)
        v[choice[s,t]] = v[choice[s,t]] + alpha[s] * pe;
      }
    }
  }
}
```

I am convinced the priors for alpha_sd and beta_sd should be specified at the top, out of the per subject loop- but I don’t have a deep enough understanding of why to be able to discuss with the team.

I think the way it’s set up in this code above means each prior is applied x nSubjects making it stronger; pulling the sd estimates more towards zero and then making each prior really narrow for each subject.

I also noticed we don’t have any priors for eg alpha_mu, beta_mu; is this ok? I understand behavioural data is very very noisy; I’ve just found this and will give it a read: Prior Choice Recommendations · stan-dev/stan Wiki · GitHub

Thanks so much in advance and if anyone has any comments or can let me know i’m mistaken much appreciated, thanks again!

Your intuition is correct here. In Stan, the distribution statement alpha_sd ~ cauchy(0, 1) functions by incrementing the log probability of the model by the log probability of the distribution (dropping constant terms).

So if you have a simple model p(\theta | y) \propto p(y | \theta) p(\theta) and you repeat the distribution statement of the prior in a loop N times like above, the model you are actually fitting is p(\theta | y) \propto p(y | \theta) p(\theta)^N. This is pretty straightforward to test out for a simple model.

As a note on \alpha_\mu and \beta_\mu, since they are defined with lower and upper bounds, then the priors are \alpha_\mu \sim \mathrm{Uniform}(0, 1) and \beta_\mu \sim \mathrm{Uniform(0, 3)}. Whether this makes sense for the problem, I can’t say, but it is generally not good practice to place hard constraints on parameters in the prior like this; it can cause various computational/inferential issues. Also, this prior communicates "we are 100% a priori certain that the parameter lies in this range, which is rarely the case (excluding, obviously, parameters in distributions that are constrained to be between 0 and 1 or strictly positive, etc.).

What typically better are soft constraints. If your prior belief is that a parameter lies in the range (0, 3), then you place a prior that has 95% or so of the mass within that region. This communicates a similar prior belief, but avoids many of the issues that come with constraints.

This is likely to cause issues in this model, because you have \alpha_s \sim \mathcal{N}(\alpha_\mu, \sigma_\alpha), where \sigma_\alpha is given that Cauchy prior, so very wide, but both \alpha_\mu and \alpha_s have hard uniform priors placed on them. This means that your priors are probably going to be fighting with each other quite a bit.

Another note, this Stan model uses the older style syntax on arrays that is being deprecated. For example:

int<lower=0,upper=4> choice[nSubjects, nTrials];
real<lower=0, upper=100> reward[nSubjects, nTrials];

can be rewritten in the more modern style:

array[nSubjects, nTrials] int<lower=0,upper=4> choice;
array[nSubjects, nTrials] real<lower=0, upper=100> reward;
2 Likes

Hi @amas0 thank you so much for your reply, I did run it with the priors pulled out of the loop and I got a slight change in the results (and for good measure repeated the lines nSubjects times out of the loop too and indeed reproduced the original results, so thank you very much!)

Much appreciated!

1 Like