Bivariate probit in Stan

James_Savage · September 27, 2017, 1:17am

Hi all –

I hadn’t seen any implementation of bivariate probit/IV (a la the Stata command biprobit) in Stan. So made one. Just thought I’d plonk it here as it’ll be more searchable when someone comes looking.

Jim

Stan code:

R code/test:

bbbales2 · October 1, 2017, 5:57pm

Just thought I’d plonk it here as it’ll be more searchable when someone comes looking.

Good stuff! Thanks for sharing

Bob_Carpenter · October 2, 2017, 10:39pm

Thanks!

I alwasy get confused about how these things relate. There’s a chapter on multivariate probit regression in the manual, but it doesn’t need binomal CDFs. Ben posted a binormal CDF implementation that I mean to put into the manual.

real q1;
real q2;
real w1;
real w2;
real rho1;
// from Greene's econometrics book
    q1 = 2*Y[1] - 1.0;
    q2 = 2*Y[2] - 1.0;
    w1 = q1*mu1;
    w2 = q2*mu2;
    rho1 = q1*q2*rho;

You can use compound declare-define and vectorize rather than defining explicitly enumerated variables, as in:

vector[2] q = 2 * Y - 1;
vector[2] w = q .* mu;
real rho1 = q[1] * q[2] * rho;

Also, you don’t need the .0 on reals (the only exception is for division).

It’d also all be a bit more efficient if you vectorized the lpdf—youl could save some redundant calculations involving rho1 and also vectorize most of the rest, I think.

James_Savage · October 3, 2017, 7:08pm

Thanks Bob – I’ll update this later in the week .

Binormal probit is a special case, as you don’t need to have a parameter for each observation. So this should give you the same estimates far more quickly. There aren’t many options once you move to higher dimensions, sadly.

Bob_Carpenter · October 3, 2017, 8:49pm

Cool—makes sense given that the cdf can do the marginalization integral.

jeremy.koster · October 4, 2017, 1:42am

Thanks. I’ll be watching for updates.

If you are feeling ambitious, James, it would be beneficial to see a version with random effects as well.

For instance, imagine that your outcomes, Y1 and Y2, represent whether students passed their reading and math tests, respectively. And imagine that the students are clustered in schools. It would good to see a covariance matrix indicating the school-level variances of passing reading and math and the school-level covariance.

James_Savage · November 27, 2017, 7:36pm

Hi Jeremy,

I did this (individual random effects) a while back but forgot to post it. Tested it against a standard individual fixed effects linear probability model and it gave qualitatively similar results.

functions {
    real binormal_cdf(real z1, real z2, real rho) {
    if (z1 != 0 || z2 != 0) {
      real denom = fabs(rho) < 1.0 ? sqrt((1 + rho) * (1 - rho)) : not_a_number();
      real a1 = (z2 / z1 - rho) / denom;
      real a2 = (z1 / z2 - rho) / denom;
      real product = z1 * z2;
      real delta = product < 0 || (product == 0 && (z1 + z2) < 0);
      return 0.5 * (Phi(z1) + Phi(z2) - delta) - owens_t(z1, a1) - owens_t(z2, a2);
    }
    return 0.25 + asin(rho) / (2 * pi());
  }
  real biprobit_lpdf(row_vector Y, real mu1, real mu2, real rho) {
    real q1;
    real q2;
    real w1;
    real w2;
    real rho1;
    
    // from Greene's econometrics book
    q1 = 2*Y[1] - 1.0;
    q2 = 2*Y[2] - 1.0;
    
    w1 = q1*mu1;
    w2 = q2*mu2;
    
    rho1 = q1*q2*rho;
    return log(binormal_cdf(w1, w2, rho1));
  }
}
data {
  int N;
  int P;
  int P2; // the number of instruments
  int I; // number of individuals
  int individual[N]; // index of individuals
  matrix[N,2] Y; // outcomes, binary
  matrix[N, P] X; // first column is a column of ones
  matrix[N, P2] Z; // matrix of instruments
}
transformed data {
  matrix[N, P+P2] X1;
  matrix[N, P + 1] X2;
  X1 = append_col(X, Z);
  X2 = append_col(Y[:,1], X);
}
parameters {
  vector[P + P2] beta;
  vector[P + 1] gamma;
  real<lower = -1, upper = 1> rho;
  real<lower = 0> sigma_1;
  real<lower = 0> sigma_2;
  matrix[I, 2] mu;
}
model {
  // priors
  beta ~ normal(0, 1);
  gamma ~ normal(0, 1);
  
  mu[:,1] ~ normal(0, sigma_1);
  mu[:,2] ~ normal(0, sigma_2);
  sigma_1 ~ normal(0, 1);
  sigma_2 ~ normal(0, 1);
  // likelihood
  for(i in 1:N) {
    Y[i] ~ biprobit(mu[individual[i], 1] + X1[i] * beta, mu[individual[i], 2] + X2[i] * gamma, rho);
  }
}

jeremy.koster · November 27, 2017, 7:43pm

Very cool. By chance, did you use simulated data when testing the model? Although I think I can infer the data structure from the data block, it would be ideal to walk through your modeling step by step. Thanks.

James_Savage · November 27, 2017, 7:44pm

Will try to write a post up on it before long and put a link here.

jeremy.koster · November 27, 2017, 7:48pm

There is a lot of potential use among ecologists for bivariate probits. (And multivariate probit responses, too, for that matter.) Particularly with correlated random effects across the outcome equations. The Stan forum has relatively few examples that go into greater depth than the Stan manual. Your post, James, is probably the most useful so far.

James_Savage · November 27, 2017, 7:48pm

Thanks, that’s very encouraging :-)

Bob_Carpenter · November 27, 2017, 8:24pm

The lack of depth in the manual is intentional. I decided to stick to the simplest possible examples of things for the manual.

The case studies are where you want to go for more complex models, but we don’t have anything on multivariate probits (I actually wrote the initial multivariate probit in the manual at the urging of Michael McCarthy, a stat ecologists in Melbourne—but it’s hard to fit).

We do accept user submissions for case studies if you (or Jim) want to put something together.

jeremy.koster · October 8, 2023, 3:52pm

@James_Savage to what extent can you envision a version in which the correlation varies as a function of covariates in the model?

P.S. We’re implementing your function in an upcoming analysis. Thanks again.

Topic		Replies	Views
Bivariate ordinal (ordered) probit Modeling	3	1086	May 21, 2018
Simulating from Multivariate Probit Modeling	9	824	March 16, 2021
Mixed Process Model: Probit and Linear Regression Modeling	1	602	June 8, 2018
Multivariate normal CDF General multivariate-normal	4	1880	April 7, 2021
Feedback request: Multivariate Probit Regression with GP Modeling rstan , fitting-issues , specification , loo	14	1156	April 9, 2021

Bivariate probit in Stan

Related topics