Latent discrete parameter: can it be hierarchical?

I have been working on a model with two success probabilities and a change point. The model recovers well when I marginalize out the change point as a latent discrete parameter. However, in the final model I would like it if I don’t have to assume one population change point, but rather change points that can vary between subjects. I have read some examples of random change point models, but the one implementation I found in Stan does not model the change point as a discrete parameter. This brings me to my question: is it possible to have a hierarchical discrete latent parameter or can it only be hierarchical if the change point is modeled as a continuous parameter?

// data:
data{

  int<lower=1> L; // number of participants
  int<lower=1> N; // number of datapoints
  int<lower=0,upper=1> Y[N]; // outcome (cancelled or not)
  int<lower=1,upper=L> ll[N]; // cases going from int 1 to L
  vector[N] X; // X-coordinates

}

// transformed data: (necessary for latent discrete parameter)
transformed data{
  real boundary[7];
  // possible boundary values are -0.75, -0.50, -0.25, 0, 0.25, 0.50, 0.75
  for(i in 1:7){
    boundary[i] = -0.75 + (0.25 * (i-1));
  }
}

// parameters:
parameters{
  
  // population-level theta's in log odds:
  real theta_left_logit;  
  real theta_right_logit;
  
  // participant-level theta's in log odds:
  vector[L] theta_left_l;
  vector[L] theta_right_l;
  
  // population sd of theta by participant:
  real<lower=0> sigma_l;
  real<lower=0> sigma_r;
}

// transformed parameter:
// RECORD Log probability FOR EACH BOUNDARY LEVEL:
transformed parameters{
  
  vector[7] lp; // lp = log probability
  lp = rep_vector(-log(7), 7); // uniform prior on all boundary values

  // for each possible boundary:  
  for(b in 1:7){
    for(n in 1:N){
      lp[b] = lp[b] + bernoulli_logit_lpmf(Y[n]|X[n]<boundary[b] ? theta_left_l[ll[n]] : theta_right_l[ll[n]]);
    }
  }
}

// model:
model{
  
  // priors for population level: (hyperpriors)
  theta_left_logit ~ normal(0, 2);
  theta_right_logit ~ normal(0, 2);
  
  // priors for population-level sigma
  // (by lb they exclude values lower than zero):
  sigma_l ~ normal(0.05, 0.025);
  sigma_r ~ normal(0.05, 0.025);
  
  // priors for participant level parameters:
  // should be drawn with population parameter as the mean:
  // and population sigma as sd:
  theta_left_l ~ normal(theta_left_logit, sigma_l);
  theta_right_l ~ normal(theta_right_logit, sigma_r);

  target += log_sum_exp(lp);
}

// extra code that allows to infer the probability of the discrete parameter:
generated quantities{
  
  vector[7] pState;
  pState = exp(lp - log_sum_exp(lp));

}
1 Like

Hi, sorry for not getting to you earlier.

I think defining some sort of hierarchy for a marginalized-out discrete parameter should be possible (although not sure it will work well in practice). A simple example would be to take a binomial or beta-binomial with an unknown success probability as the prior for the discrete parameter.

So in your code instaed of lp = rep_vector(-log(7), 7); you would have something like:

for(b in 1:7) {
  lp[b] = binomial_lpmf(b | theta);
}

where theta is a new parameter representing the location (scaled to [0,1]) where you roughly expect the “average” changepoint.

To what extent is the binomial hyperprior sensible is not clear - you could add more flexibility by using a beta-binomial or maybe use a different distribution completely. Maybe running a prior predictive check (simulating from the distributions to see how big of a correlation in the change point locations you can get) could clarify stuff.

Hope that helps at least a bit!