Determining value of prior distributions parameters

Hello,

I would like to define the values of the distribution parameters in a hierarchical Bayesian model of Reaction times. I am using the brms package.

I don’t understand how to set values for the parameters of the priors. The online guide Prior Choice Recommendations suggest using a normal(0,10), but the range of the values for reaction times in this study goes from 200 to 750 ms.

  1. How do I formalise this information in the prior, given that I want a very weakly informative prior?
  2. If I understood correctly, I need to define priors on the parameters defining the outcome variable distribution (here, an ex-Gaussian). How do I define the values for the different parameters outcome distribution?
  3. Another measure is accuracy, for which the model is the same, but I am using a different family (family = bernoulli(link = “logit”)). How does the prior parameter estimation changes in this case?

Thank you very much!

Here is the model I am using:

RTs1.bmodel = brm(RTs  ~  conditionStimuli * sequenceTrials + (conditionStimuli * sequenceTrials | Num_part)
                  , data = data_RTs_go
                  , family = exgaussian(link = "identity")
                  , warmup = 500
                  , iter = 2000                         
                  , chains = 3
                  , inits = "random"
                  , cores = 2
                  , seed = 123
                  , save_pars = save_pars(all = TRUE)
                  , prior = prior1_rt
                  , sample_prior = "only"  #for prior predictive checks
)
data {
  int<lower=1> N;  // total number of observations
  vector[N] Y;  // response variable
  int<lower=1> K;  // number of population-level effects
  matrix[N, K] X;  // population-level design matrix
  // data for group-level effects of ID 1
  int<lower=1> N_1;  // number of grouping levels
  int<lower=1> M_1;  // number of coefficients per level
  int<lower=1> J_1[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_1_1;
  vector[N] Z_1_2;
  vector[N] Z_1_3;
  vector[N] Z_1_4;
  vector[N] Z_1_5;
  vector[N] Z_1_6;
  int<lower=1> NC_1;  // number of group-level correlations
  int prior_only;  // should the likelihood be ignored?
}
transformed data {
  int Kc = K - 1;
  matrix[N, Kc] Xc;  // centered version of X without an intercept
  vector[Kc] means_X;  // column means of X before centering
  for (i in 2:K) {
    means_X[i - 1] = mean(X[, i]);
    Xc[, i - 1] = X[, i] - means_X[i - 1];
  }
}
parameters {
  vector<lower=250,upper=750>[Kc] b;  // population-level effects
  real Intercept;  // temporary intercept for centered predictors
  real<lower=0> sigma;  // residual SD
  real<lower=0> beta;  // scale parameter
  vector<lower=0>[M_1] sd_1;  // group-level standard deviations
  matrix[M_1, N_1] z_1;  // standardized group-level effects
  cholesky_factor_corr[M_1] L_1;  // cholesky factor of correlation matrix
}
transformed parameters {
  matrix[N_1, M_1] r_1;  // actual group-level effects
  // using vectors speeds up indexing in loops
  vector[N_1] r_1_1;
  vector[N_1] r_1_2;
  vector[N_1] r_1_3;
  vector[N_1] r_1_4;
  vector[N_1] r_1_5;
  vector[N_1] r_1_6;
  // compute actual group-level effects
  r_1 = scale_r_cor(z_1, sd_1, L_1);
  r_1_1 = r_1[, 1];
  r_1_2 = r_1[, 2];
  r_1_3 = r_1[, 3];
  r_1_4 = r_1[, 4];
  r_1_5 = r_1[, 5];
  r_1_6 = r_1[, 6];
}
model {
  // likelihood including constants
  if (!prior_only) {
    // initialize linear predictor term
    vector[N] mu = Intercept + Xc * b;
    for (n in 1:N) {
      // add more terms to the linear predictor
      mu[n] += r_1_1[J_1[n]] * Z_1_1[n] + r_1_2[J_1[n]] * Z_1_2[n] + r_1_3[J_1[n]] * Z_1_3[n] + r_1_4[J_1[n]] * Z_1_4[n] + r_1_5[J_1[n]] * Z_1_5[n] + r_1_6[J_1[n]] * Z_1_6[n];
    }
    target += exp_mod_normal_lpdf(Y | mu - beta, sigma, inv(beta));
  }
  // priors including constants
  target += normal_lpdf(b | 0, 3)
    - 5 * log_diff_exp(normal_lcdf(750 | 0, 3), normal_lcdf(250 | 0, 3));
  target += normal_lpdf(Intercept | 0, 3);
  target += student_t_lpdf(sigma | 3, 0, 118.6)
    - 1 * student_t_lccdf(0 | 3, 0, 118.6);
  target += gamma_lpdf(beta | 1, 0.1);
  target += normal_lpdf(sd_1 | 0, 3)
    - 6 * normal_lccdf(0 | 0, 3);
  target += std_normal_lpdf(to_vector(z_1));
  target += lkj_corr_cholesky_lpdf(L_1 | 1);
}
generated quantities {
  // actual population-level intercept
  real b_Intercept = Intercept - dot_product(means_X, b);
  // compute group-level correlations
  corr_matrix[M_1] Cor_1 = multiply_lower_tri_self_transpose(L_1);
  vector<lower=-1,upper=1>[NC_1] cor_1;
  // extract upper diagonal of correlation matrix
  for (k in 1:M_1) {
    for (j in 1:(k - 1)) {
      cor_1[choose(k - 1, 2) + j] = Cor_1[j, k];
    }
  }
}

Here is an example of the data:

Num_part trial_type  Go_type conditionStimuli ITI_ms response RTs correctResponse order_pres sequenceTrials sdt
2        1         Go     Bent           leaves    819        1 301               1          1            NGG   1
3        1         Go     Bent           leaves    771        1 237               1          1             GG   1
4        1         Go     Bent           leaves   1086        1 393               1          1             GG   1
5        1         Go Straight           leaves    652        1 331               1          1             GG   1
7        1         Go     Bent           leaves    919        1 372               1          1            NGG   1
9        1         Go Straight           leaves    802        1 359               1          1            NGG   1

The best thing to do here is to generate prior predictive data and iterate on the priors until the prior predictive data starts to match up with what your domain expertise suggests. This procedure is generally called a prior-predictive check (PPC)

I’m pretty familiar with RT data, but even the single added complexity of your exgaussian link would leave me feeling that I might make mistakes if trying to set the priors without doing PPCs.

The one caveat on the above is that you’ll want to take care to modify the init values to scale with your prior parameter values. For example, if you had a simple normal link and were trying to do a PPC with a prior on the intercept of Intercept ~ normal(400,100), you’d want the inits to be something like init=800. If you stuck with the default inits, which are init = 2, the sampler would start in a range that is very far from the typical set implied by your prior and could have trouble getting to the typical set during adaptation.

1 Like

Thank you mike-lawrence!

I planned to run a PPC, but I am not sure how to run it. What I understood so far is that the model has to include " sample_prior = "only"", after running the model I check the data simulated (visually by “conditional_effects” or plots). Is there a more appropriate procedure to run PPC?

Also, how do I modify the parameters’ values to reflect my prior expectations? For example, I know that the range of RT values is between 250 and 750 ms: as parameters of a possible uniform distribution, I simply put these values or do they need to be transformed/adapted? I don’t understand the meaning of parameters’ values used in many publications…
EDIT: if I set a prior set_prior("uniform(200,750)", class = "b", lb = 200, ub = 750) I still get impossible values (negative RTs). Which class I should assign a prior to avoid these impossible values?

Regarding the caveat, I see only now that is an important point (found this link too: Don't forget your inits | A. Solomon Kurz).
Would you suggest a general rule to set the inits?

Are there other aspects I need to improve regarding the model?

Thank you!

1 Like