Hierarchical model with probability parameter


#1

Hi,

I have a hierarchical model with a beta prior for a vector of subject-level parameters, with hyperparameters w1 and w2.
I am trying with 4 chains and 15000 iterations (14000 for warmup) and still Rhat ~ 1.3 for w2.
Which is the recommended prior for the probability parameter xi, bounded between 0 and 1, to avoid sampling problems in a hierarchical model?

Best regards, Benjamin

data {
  int<lower=0> N_SUBJECTS;    
  int<lower=0> N_TRIALS;      
  int<lower=0> choice[N_TRIALS, N_SUBJECTS]; 
  real k[N_TRIALS, N_SUBJECTS];
}


parameters {
  
  vector<lower=0, upper=1>[N_SUBJECTS] xi; 
  real<lower=0> w1;
  real<lower=0> w2;
  
  ...  
  
}

model {
  
  w1 ~ cauchy(0, 10);
  w2 ~ cauchy(0, 10);
  
  xi ~ beta(1/w1  + 1, 1/w2 + 1);
  
  for (t in 1:N_TRIALS) 
    for (s in 1:N_SUBJECTS)
      choice[t, s] ~ bernoulli(Phi_approx(k[t, s]-m[s])*
                                 (1-xi[s]) + xi[s]/2);
  
  ...
  
    
}

#2

Hm I just took a quick look. So you’re modeling the choice probability as being a function of the k. What is m[s] and do you want a scalar parameter in front of the k[t,s]? Can you explain your choice on the probability parameter that goes in to the Bernoulli Phi_approx(k[t, s]-m[s])*(1-xi[s]) + xi[s]/2. Looking at the manual the way the Phi_appox is used is y[n] ~ bernoulli(Phi_approx(alpha + beta * x[n])); so I’m just wondering what the xi multiplication and addition term is doing.

Also just wondering why are you parameterizing the beta distribution in that way instead of the usual alpha and beta? That could affect the posterior geometry and make it harder to sample.


#3

Hi Arya,
I did not write the full model to keep things clear, but the problem seems to be the xi parameter. The chains mix well when I remove it.
xi is a ‘lapse’ parameter, quite standard in models of choice, accounting for noise in the choice probability - the part not explained by the predictors.
I don’t think the manual implies that Phi_approx can be used only in that way, it is just an example. I use it to squash the bernoulli parameter between 0 and 1 (could use inv_logit as well). The multiplication with (1 -xi) and addition of xi/2 keeps the parameter within [0, 1] as long as xi is also in that range.
I initially tried beta(w1 + 1, w2 + 1) but it did not work either. Any other thoughts?