Bivariate ordinal (ordered) probit

biorprob.stan (2.6 KB)
biorprob.r (1008 Bytes)

Hello everyone,

I’ve long wanted to estimate a bivariate, ordinal probit with Stan (or any other sensible program, really…), but haven’t been able to due to lack of the bivariate normal CDF. But @Bob_Carpenter pointed to Owens_T function being implemented, and therefore it is possible to calculate the biv. CDF. Nifty! :-)

I have used Bob’s code, and written a Stan-program for a bivariate probit. The likelihood is defined in section 10.2 in this book by Greene and Henscher.

The model recovers simulated parameters and correlation coefficients nicely - and runs surprisingly fast! I was honestly expecting more numerical bugs.

Appreciate any comments you might have for improving the simple model. Lots of If’s in defining the likelihood.

But otherwise a great thanks to the Stan-devs for making adding new and exciting stuff!

Ole-Petter

functions{
  real binormal_cdf(real z1, real z2, real rho) {
  if (z1 != 0 || z2 != 0) {
    real denom = fabs(rho) < 1.0 ? sqrt((1 + rho) * (1 - rho)) : not_a_number();
    real a1 = (z2 / z1 - rho) / denom;
    real a2 = (z1 / z2 - rho) / denom;
    real product = z1 * z2;
    real delta = product < 0 || (product == 0 && (z1 + z2) < 0);
    return 0.5 * (Phi(z1) + Phi(z2) - delta) - owens_t(z1, a1) - owens_t(z2, a2);
}
return 0.25 + asin(rho) / (2 * pi());
}


real bi_ord_ll(real mu1, real mu2, vector c1, vector c2, 
  int y1, int y2, int d1, int d2,real rho) {
real tmp;
  if(y1>1 && y2>1 && y1<d1 && y2<d2){
    tmp = binormal_cdf(c1[y1  ]-mu1,c2[y2  ]-mu2,rho)-
	      binormal_cdf(c1[y1  ]-mu1,c2[y2-1]-mu2,rho)-
          binormal_cdf(c1[y1-1]-mu1,c2[y2  ]-mu2,rho)+
		  binormal_cdf(c1[y1-1]-mu1,c2[y2-1]-mu2,rho);
  }
  if(y1==1 && y2>1 && y2<d2){
    tmp = binormal_cdf(c1[y1  ]-mu1,c2[y2  ]-mu2,rho)-
	      binormal_cdf(c1[y1  ]-mu1,c2[y2-1]-mu2,rho);
  }
  if(y2==1 && y1>1 && y1<d1){
    tmp = binormal_cdf(c1[y1  ]-mu1,c2[y2  ]-mu2,rho)-
	      binormal_cdf(c1[y1-1]-mu1,c2[y2  ]-mu2,rho);
  }
  if(y1==1 && y2==1){
    tmp = binormal_cdf(c1[y1  ]-mu1,c2[y2  ]-mu2,rho);
  }
  if(y2>1 && y2<d2 && y1==d1 ){
    tmp = Phi(c2[y2]-mu2)-
	      Phi(c2[y2-1]-mu2)-
          binormal_cdf(c1[y1-1]-mu1,c2[y2  ]-mu2,rho)+
		  binormal_cdf(c1[y1-1]-mu1,c2[y2-1]-mu2,rho);
  }
  if(y1>1 && y1<d1 && y2==d2 ){
    tmp = Phi(c1[y1]-mu1)-
	      Phi(c1[y1-1]-mu1)-
          binormal_cdf(c2[y2-1]-mu2,c1[y1  ]-mu1,rho)+
		  binormal_cdf(c1[y1-1]-mu1,c2[y2-1]-mu2,rho);
  }
  if(y1==d1 && y2==d2){
    tmp = 1-
	      Phi(c2[y2-1]-mu2)-
		  Phi(c1[y1-1]-mu1)+
		  binormal_cdf(c1[y1-1]-mu1,c2[y2-1]-mu2,rho);
  }
  if(y1==1 && y2==d2){
    tmp = Phi(c1[y1  ]-mu1)-
	      binormal_cdf(c1[y1  ]-mu1,c2[y2-1]-mu2,rho);
  }
  if(y2==1 && y1==d1){
    tmp = Phi(c2[y2  ]-mu2)-
	      binormal_cdf(c2[y2  ]-mu2,c1[y1-1]-mu1,rho);
  }
return log(tmp);
}
}

data{
int N;               // Number of obs .
vector[N] x1;        // Cov. outcome. 1
vector[N] x2;        // Cov. outcome. 2
int<lower=1> Y1[N];  // Observed response 1
int<lower=1> Y2[N];  // Observed response 1
}

transformed data{
int d1;    // Number of poss. responses outcome. 1
int d2;    // Number of poss. responses outcome. 1
d1 = max(Y1);
d2 = max(Y2);
}

parameters{
real<lower=-1,upper=1> rho;
real beta1;
real beta2;
ordered[d1-1] c1;
ordered[d2-1] c2;
}

model{
for(i in 1:N){
target += bi_ord_ll(beta1*x1[i],beta2*x2[i],c1,c2,Y1[i],Y2[i],d1,d2,rho);
}
}

I see from the book that is a bivariate modeling approach that only works for probit. It would be nice to use a more general approach that would also work for the proportional odds ordinal logistic model, using copulas.

The R brms package makes fitting univariate ordinal models so easy that I’m hoping the package will add multivariate copula capabilities for such problems.

Yes, need the bi/multivariate CDF to estimate this, and think the multivariate normal CDF is hard enough. Don’t know much about copulas in Stan.

Why would you want an logistic model instead, though?

The effects in the logit link (proportional odds model) have ready interpretations as odds ratios, and the proportional odds model is a special case of the much-liked Wilcoxon test. If not using latent variables you can evaluate the copula directly, without using any complex CDFs.