Ordering constraint for hierarchical generalised ordered probit model


For an ordered probit model (with just an intercept \beta), likelihood contributions have the form

\Phi(c_{k+1} - \beta) - \Phi(c_{k} - \beta)

For the model to be well defined, we set ordering constraints on the cutpoints so that

c_{k} \le c_{k+1}

Which we can do by defining an ordered vector and setting priors e.g. @betanalpha 's induced dirichlet

For my hierarchical model, I have a hierarchical structure on the intercepts, with cutpoints fixed between studies, so that

\Phi(c_{k+1} - \beta_{s}) - \Phi(c_{k} - \beta_{s})

With between-study model

\beta_{s} \sim N(\mu_{\beta}, \sigma_{\beta} )

This works fine in Stan. Note that the model’s I am using are multivariate mixtures ( for which I used and extended bgoori’s parameterisation for ). The model I have specified here is a little simpler than the one I am programming to simplify notation for this post. The extra detail is not necessary for the specific issue I am having at the moment (i.e. ordering constraints).

Now I am trying to extend to a generalised ordered probit model to relax the proportional odds / parallel lines assumumption - so that the \beta's themselves vary between each threshold k , so the model now looks like this

\Phi( c_{k+1} - \beta_{s, k+1} ) - \Phi(c_{k} - \beta_{s, k})

With between-study model

\beta_{s,k} \sim N(\mu_{\beta} + \nu_{k} , \sigma_{\beta} )

So we have shared s.d. (\sigma_{\beta}) between thresholds k, but different means, with \mu_{\beta} a mean parameter across the thresholds

Now, we need a more complicated ordering constraint:

c_{k} - \beta_{s, k} \le c_{k+1} - \beta_{s, k+1}

Would anybody have any suggestions as to how to enforce such a constraint in Stan (whilst still being able to interpret the \beta_{s,k}'s and the means \mu_{\beta} + \nu_{k} ), given that the \beta_{s,k}'s are hierarchical?

Something I was thinking was keeping the same constraint as before for the cutpoint parameters ( c_{k} \le c_{k+1} ), and then setting \beta_{s, k} \ge \beta_{s, k+1} for the , however I am not sure how to do the latter for a non-centred parameterisation (each \beta_{s, k} is hierarchical )

If you ask me, I use a parameter transformation as follows.

dc_k:=c_{k+1} -c_k,
d \beta_{s,k}:=\beta_{s,k+1}-\beta_{s,k},
d \gamma_{s,k}:=dc_{s,k+1}-d\beta_{s,k},

and define priors for new transformed parameters as follows
dc_k \sim \text{Uniform}(0,10),
d\gamma_{s,k}\sim \text{Uniform}(-10,0),

where I use uniform priors for simplicity.

If we define the priors as above, then d\gamma_{s,k} is negative and this implies the constraints c_{k} - \beta_{s, k} \le c_{k+1} - \beta_{s, k+1}.

Similarly, the prior defined above guarantees the positivity of dc_{k} and it gives the desired monotonicity of c_k.

I guess your priors on \beta_{s,k} contradicts with the constraints \beta_{s,k} \geq \beta_{s,k+1} .
Does the quantity \Phi(c_{k+1} - \beta_{s}) - \Phi(c_{k} - \beta_{s}) mean hit rate in the signal detection theory using some latent Gaussian random variable.
I also implemented such a model for signal detection. I am also interested how to accomplish these monotonicity constraints.

1 Like


Just to update @Jean_Billie , I have managed to get a *centred * version working, since

\beta_{s, k} \ge \beta_{s, k+1} + c_{k} - c_{k+1}

We can specify (example for k = 2 so 2 thresholds) :

parameters {
              vector<lower=0>[2] sd; 
              vector[2]  mu;
              vector[2]  nu;
              ordered[2] C; // cutpoints
              vector[NS] beta_2; 
              vector<lower= beta_2 + C[1] - C[2]>[NS]  beta_1; // sets beta_{s, k}  >=    beta_{s, k+1} +   c_{k} - c_{k+1} constant 
model { 
    beta_1  ~ normal(nu[1] + mu[1] , sd) ; 
    beta_2  ~ normal(nu[2] + mu[2] , sd) ; 

Here is a screenshot using a model with the implementation above, good R-hat and improved fit compared to one assuming proportional odds (i.e. standard ordered regression with fixed \beta's across thresholds k - despiteit being a centered param. it seems to work fine for this dataset


Now time to get it working with non-centred parameterisation

Also, another issue is that the code above is fine for a few thresholds, but for models I have with more I’ll need a way to make the constraints a loop, otherwise the code will have a lot of copy-pasted code and will need editing every time we have a different dataset with different k which is quite annoying

I gotcha!

Using the code <lower= beta_2 + C[1] - C[2]>, also the desired constraints are obtained, and your code may be essentially as same as my above suggestion.

I also want to know what is the traditional methods for the inequality constraints on parameters.

To introduce many thresholds, I used vectors. in my model.
Because a few thresholds leads good sampling, that’s sounds nice!!

1 Like


I’m not sure how to implement your method. Defining priors on transformed parameters does not restrict the range of the parameters it is essentially just a check. If you define the differences (gammas) in the parameters block itself and construct the beta’s out of those and the cutpoints, it would work, but im not sure it would with my model where the beta’s are hierarchical and require a pooled mean at each cut point (as well as an overall mean parameter which pools across the cutpoints) to be estimated. Also, for the cutpoints themselves better to use induced Dirichlet priors than setting Unif(0,10) on the pairwise differences.

Consider the following model:

X_i \sim \text{Normal}(t_i,1)

where i=1,...,111.

Furthermore, we assume t_1 \leq t_2\leq \cdots \leq t_{111} which is accomplished by the following priors.

dt_i := t_{i+1} - t_{i} \sim \text{Exponential(1)}

The reason why I use the above prior is that its support is contained in the set of all positive real numbers. and this will lead us to the constraints dt_i>0 or equivalently t_{i+1}>t_i. Note that, fortunately, the determinant of the Jacobian of the parameter transformation from (t_1,t_2,...,t_{111}) into (t_1,dt_1,dt_2...,dt_{110}) is 1 and hence the Jacobian adjustment is redundant for the calculation of precise log likelihoods.

NNN <- 111
   # NNN <- 3

m <- rstan::stan_model(model_code = '
                                      int NNN;
                                      real x[NNN];
                                parameters {
                                            real t1;
                                            real<lower=0>  dt[NNN-1];
                              transformed parameters {
                                            real   t[NNN];
                                            for(i in 2:NNN)  t[i]=t[i-1]+dt[i-1]; 


                                model {
                                for(i in 1:NNN) x[i] ~ normal(t[i],1);

                                for(i in 1:(NNN-1)) dt[i] ~ exponential(1);

f <- rstan::sampling(m, data=list(NNN=NNN,x=1:NNN), iter = 333,chains=1)

# Check the monotonicity, i.e., t_1  <  t_2 <  t_3 < ......
extract(f)$t[,2] > extract(f)$t[,1]
extract(f)$t[,3] > extract(f)$t[,2]
1 Like