Prior choice for ordered inverse transformed parameters

Hello!

I am fitting a model that contains ordered parameters using an ordered inverse transformation. There is a principled assumption for ordering the parameters and the model fits give me more reasonable parameters when I order them.

But the efficiency and the quality (in terms of parameter recovery using simulated data or actual data) of the fits depends significantly on the priors that I use for the transformed parameter.

I couldn’t find anything about prior distributions on transformed parameters of this kind, so are there typical recommendations? Apologies if this has been addressed and I just failed to find it.

I am fitting the parameters hierarchically, so that session level parameters are constrained by individual level hyperparameters. Here is the relevant part of the code:

data{
  // number of sessions
  int<lower=1> N;
}
parameters {
  // hyperparameters (individual level)
  vector[2] mu_p;
  vector<lower=0>[2] sigma;
 
  // raw session level parameters
  vector[N] theta1_raw;
  vector[N] theta2_raw;
}
transformed parameters {
  // session level parameters
  vector<lower=0, upper=1>[N] theta1;
  vector<lower=0, upper=1>[N] theta2;

  for (n in 1:N) {
   // constrain the parameters to (0,1)
    theta1[n] = Phi_approx(mu_p[1] + sigma[1] * theta1_raw[n]);
    theta2[n] = Phi_approx(mu_p[2] + sigma[2] * (exp(theta2_raw[n]) + theta1_raw[n]));
  
}
model {
  // hyperparameters
  mu_p  ~ normal(0, 1);
  sigma ~ cauchy(0, 5);

  // individual parameters
  theta1_raw  ~ normal(0, 1);
  theta2_raw ~ normal(0, 3);  //this is the prior in question

  ...

} 

Thanks in advance for any advice!

Hi there! Welcome to the Stan forum! Sorry it took us some time to reply.

Usually those _raw parameters have a standard normal distribution (you can use theta1 ~ std_normal()). The scaling should only be governed by the sigmas

I’m afraid I can not really help you any more with this one. Did you try using Phi or maybe a inv_logit transformation? Maybe they are more numerically stable (especially the inv_logit). I guess you could run into identifiability problems with a set up like this.

Let me try to summon the master of principled assumptions, priors and a whole lot more, @betanalpha. :)

Cheers,
Max

2 Likes

In general priors for ordered parameters are tricky and have not really been researched enough even in the unconstrained case.

The implicit prior for a model like

parameters {
  ordered[10] x;
}

model {
}

is uniform on the smallest value and then uniform on the differences between the remaining values. Placing a prior on the ordered parameters, and not their differences, is non-generative and typically introduces some weird behavior. For example while one might expect

parameters {
  ordered[10] x;
}

model {
  x ~ normal(0, 1);
}

to simply contain the parameters in the interval [-3, 3] or thereabouts it actually interacts with the constraint to enforce a sort of uniform repulsion between the interior points, resulting in very rigid differences.

Really the challenge here is figuring out the context where one’s domain expertise manifests, and this is usually a generative context. For example let’s consider ordered parameters that are additionally constrained to the unit interval, as in the first post.

Now we need to maintain ordering and the interval constraint in addition to incorporating principled domain expertise, which is awkward unless we can find some generative model that naturally incorporates these constraints. Fortunately we have one for this case – the stick-breaking process.

parameters {
  real<lower=0, upper=1> cond_probs[10];
}
transformed parameters {
  real rev_ordered_probs[10];
  rev_ordered_probs[1] = cond_probs[1];
  for (n in 2:N) 
    rev_ordered_probs[n] = cond_probs[n] * rev_ordered_probs[n - 1];
}
model {
  cond_probs ~ beta(1, 1);
}

Here we allocate some of the unit interval to the first ordered probability, then we allocate some of the remaining unit interval to the second ordered probability, and then recurse. We can also think of this as

\mathbb{P}[n] = \mathbb{P}[n | n - 1] \cdot \mathbb{P}[n - 1],

hence the variable names. Because the conditional probabilities are less than one the ordered probabilities will monotonically decrease, creating a reverse ordering which readily be reversed to give the monastically increasing ordering. What really makes this approach so useful, however, is that often we can reasonably about conditional probabilities much more easily, and hence build principled priors for them without too much trouble.

3 Likes

Thanks a ton for the thorough and very clear reply. I think this is what I am looking for. I just had a couple of questions:

Should that for loop run through 2:N instead of 1:N? And does constraining these ordered parameters hierarchically violate the process? I’m not quite sure the best way to implement the hierarchical constraints here while maintaining the (0,1) bounds on the parameters ( rev_ordered_probs )

Thanks! I’ve been lurking for awhile so it’s about time I made an account. Don’t worry at all about the delay and thank you for your suggestions!

1 Like

I just realized that I didn’t reply directly to your comment:

Yes.

Not sure what you mean by “hierarchical constraints”.

1 Like

Thanks again for the reply!

In the model I posted, there are ordered parameters for each individual session that are governed by individual-level hyperparameters (each individual has multiple sessions so this structure allows for some pooling across days under the assumption that individuals are relatively consistent). Sorry that the use of “constrain” was confusing/wrong and the session level parameters are mislabeled in the model block.

In your example there is only one layer of parameters and it’s not clear to me how to govern them with hyperparameters while maintaining the [0,1] value constraints. Apologies if there is an obvious solution!

Sorry, for some reason when I add a quote from you to a reply, it doesn’t count the post as a reply.

One of the benefits of generative modeling is that it makes it clear what models are well-defined. In this case it will be awkward to try to model heterogeneity in the ordered parameters directly in a mathematically (not to mention statistically) self-consistent way. Modeling heterogeneity in the conditional probabilities, however, is no problem. Just replace each cond_prob parameter with a generative function, for example

cond_prob[n] = inv_logit(alpha0 + alpha[n]).
2 Likes

Thanks so much! I really appreciate it.

Here is my implementation of your suggestions. Given the logistic transform, I’m assuming the priors for the hyperparameters and raw session level parameters can be normal distributions. I ditched the for loop and separated the raw parameters since I only have 2 ordered parameters.

data{
  // number of sessions
  int<lower=1> N;
}
parameters {
  // hyperparameters (individual level)
   vector[2] theta0
 
  // raw session level parameters
  vector[N] theta1_raw;
  vector[N] theta2_raw;
}
transformed parameters {
  // session level parameters
  vector<lower=0, upper=1>[N] theta1;
  vector<lower=0, upper=1>[N] theta2;

  for (n in 1:N) {
   // constrain the parameters to [0,1] and order them, where theta2 < theta1
    theta1[n] = inv_logit(theta0[1] + theta1_raw[n]);
    theta2[n] = theta1[n] * inv_logit(theta0[2] + theta2_raw[n]);
  
}
model {
  // hyperparameters
  theta0  ~ normal(0, 1);

  // individual parameters
  theta1_raw ~ normal(0, 1);
  theta2_raw ~ normal(0, 1);

  ...

} ```

Looks good. The normal prior densities are fine, but the scale 1 is relatively extreme once the link function is applied. At the very least you should do some careful prior push forward checks on the theta1 and theta2 to ensure that the induced priors are reasonable.

1 Like