Feature request: copulas for multivariate responses of mixed types

Ben’s code for the multivariate probit can be found here.

I also provide an example of how Ben’s code may be used in the mixed discrete-continuous Gaussian copula. Note, I’ve hacked the indices slightly because I have both binary and polychotomous categories.

This can be used much like the marginals highlighted here

  matrix[] ordinal_marginal(array [,] int y, matrix mu_glm, real[,] u, array [] int n_thr, array [] vector tau, matrix L){
    int N = rows(mu_glm);
    int K = cols(mu_glm);
    matrix[N,K] rtn[2];
    
    for(n in 1:N){
      vector[K] z[N];
      real prev = 0;
      for(k in 1:K){
        if(n_thr[k] == 1){ //binary
          real bound;
          bound = Phi(-(mu_glm[n,k] + prev) / L[k,k]);
          if(y[n,k] == 1){
            real t;
            t = bound + (1 - bound)*u[n,k];
            z[n,k] = inv_Phi(t);
            rtn[1][n,k] = z[n,k];
            rtn[2][n,k] = log1m(bound);
          } else{
            real t;
            t = bound*u[n,k];
            z[n,k] = inv_Phi(t);
            rtn[1][n,k] = z[n,k];
            rtn[2][n,k] = log(bound);
          }
        } else{ // polychotomous
          real t;
          if(y[n,k] == 1){
            real ub = Phi((tau[1+k-2,1] - (mu_glm[n,k] + prev)) / L[k,k]);
            t = ub*u[n,k];
            z[n,k] = inv_Phi(t);
            rtn[1][n,k] = z[n,k];
            rtn[2][n,k] = log(ub);
          } else if (y[n,k] == n_thr[k] + 1){
            real lb = Phi((tau[1+k-2,n_thr[k]] - (mu_glm[n,k] + prev)) / L[k,k]);
            t = lb + (1 - lb)*u[n,k];
            z[n,k] = inv_Phi(t);
            rtn[1][n,k] = z[n,k];
            rtn[2][n,k] = log1m(lb);
          } else{
            real lb = Phi((tau[1+k-2,y[n,k] - 1] - (mu_glm[n,k] + prev)) / L[k,k]);
            real ub = Phi((tau[1+k-2,y[n,k]] - (mu_glm[n,k] + prev)) / L[k,k]);
            t = lb + (ub-lb)*u[n,k];
            z[n,k] = inv_Phi(t);
            rtn[1][n,k] = z[n,k];
            rtn[2][n,k] = log(ub-lb);
          }
        }
        if(k < K){
          prev = L[k+1, 1:k]*head(z[n],k);
        }
      }
    }
    
    return rtn;
  }
3 Likes

Thanks for posting this!

I am a bit confused about the tau variable. I guess this variable is for the latent thresholds, but I am not sure how the indexes work - the second one represents the threshold but what does the first index represent?

Also, for k=1 since 1+k-2 = 0, won’t it index at 0 (i.e. tau[0, ...])?

Hello - thank you!

The indices relate to my above comment about “hacking” to deal with the different variable types I have. It’s quite likely there is a more efficient way to code it up?

In the above code, k=3, where k1 is binary and 2-3 is polytomous. tau is a 2D array of vectors for the two poly variables. I’ve ordered the input of my response variable so binary is first. So in the above code, k=1 is not included in that step due to the if…else statement. For k=2 it would be 1+2-2 so the index would be tau[1,1], for k=3 it would be tau[2,1] or the second vector of the 2D array and the the first threshold. Does this make sense?

In your gold standard model, you use s to index which tau vector to use. When I tried this approach just using k, I’d get errors related to a mismatch in size. So I’ve hacked it so that value always starts at 1 (the first threshold vector).

Broadly, I have done this because I have high-dimensional data vector with different thresholds across this vector. This includes binary variables, 5-levels, 7-levels, 12-levels, etc. I include some code here of the full model (continuous and ordinal). Maybe it’s a totally inefficient way to do? Essentially I am just trying to capturing which tau vector to use.

functions{
  
  // Gaussian Copula Functions coded by Sean Pinkney 
  // https://spinkney.github.io/helpful_stan_functions/group__gaussian__copula.html
  // Method derived from Smith and Khaled (2011) DOI: 10.2139/ssrn.1937983.
  
  // U is a matrix of random variates for each marginal returned from the function 
  // centered_gaussian_copula_cholesky and L is the Cholesky factor of the corr mat
  real multi_normal_cholesky_copula_lpdf(matrix U, matrix L) {
    int N = rows(U); // total number of observations
    int J = cols(U); // total number of responses (J+K)
    matrix[J, N] G = mdivide_left_tri_low(L, U'); // equivalent to L*inverse(tri(U))
    return -N * sum(log(diagonal(L))) -  0.5 * ( sum( G^2 ) - sum(U^2) ); // log probability
  }
  
  // marginals is a nested array of marginal calculations from each marginal type
  // L is the cholesky factor of the correlation matrix and order is the variable order
  // In combinatinon with above function, returns the log probability
  real centered_gaussian_copula_cholesky_(matrix[,] marginals, matrix L, int[] order) {
    // Extract dimensions of final outcome matrix
    int N = rows(marginals[1][1]); // total number of observations
    int J = rows(L); // number of responses (J+K)
    matrix[N, J] U; // empty matrix for marginal calculations

    // Iterate through marginal arrays, concatenating the outcome matrices by column
    // and aggregating the log-likelihoods (from continuous marginals) and jacobian
    // adjustments (from discrete marginals)
    real adj = 0;
    int pos = 1;
    for (m in 1 : size(marginals)) {
      int curr_cols = cols(marginals[m][1]);

      U[ : , pos : (pos + curr_cols - 1)] = marginals[m][1];

      adj += sum(marginals[m][2]); //jacobian adjustments
      pos += curr_cols;
    }

    // Return the sum of the log-probability for copula outcomes and jacobian adjustments
    return multi_normal_cholesky_copula_lpdf(U[ : , order] | L) + adj;
  }
  
  // Returns 2D array of matrices containing the centered random variables rtn[1] and
  // jacobian adjustments rtn[2]. Assumes the marginal is distributed std_normal() 
  matrix[] continuous_marginal(matrix y, matrix mu_glm, matrix sigma) {
    int N = rows(mu_glm);
    int J = cols(mu_glm);
    matrix[N, J] rtn[2];
    // Initialise the jacobian adjustments to zero, as vectorised lpdf will be used
    rtn[2] = rep_matrix(0, N, J);

    for (j in 1 : J) {
      rtn[1][ : , j] = (y[ : , j] - mu_glm[ : , j]) ./ sigma[ :,j]; // random variate
      rtn[2][1, j] = normal_lpdf(y[ : , j] | mu_glm[ : , j], sigma[ :,j]); // jacobian adj
    }

    return rtn;
  }
  
  // Adapted from Ben Goodrich's Stan Implementation of the GHK Algorithm
  // https://github.com/stan-dev/example-models/blob/master/misc/multivariate-probit/probit-multi-good.stan#L11
  // We know the latent data model (Albert and Chib) can be rewritten using cholesky 
  // factor y = mu + L*z where L is the cholesky and z ~ N(0,1). One can simulate draws
  // from a trun MVN using draws from a univ Normal. Given the constraints (U and L bounds)
  // we introduce a utility u that is standard uniform (0,1). In instances where data are
  // missing,there is no constraint on y and u is modelled as any other unknown parameter.
  // The approach of treating missing data as additional parameters in the model can be
  // thought of as predictions of the marginal model and is the recommendation put forth
  // in BDA3 as well as several Stan dev's given the analytical issues to integrate
  // out the missing data
  
  // Returns 2D array of matrices of random latent variates (z~N(0,1)) and jacobian adjustments
  // u is the utility parameter assumed standard uniform, n_thr is vector describing
  // number of cutpoints for each response, tau is the cutpoint parameter, L is the cholesky factor
  matrix[] probit_marginal(array [,] int y, matrix mu_glm, real[,] u, array [] int n_thr, array [] vector tau_teeth, array [] vector tau_seven_stage, array [] vector tau_pelvis, vector tau_carpals, vector tau_tarsals, matrix L){
    
    int N = rows(mu_glm);
    int K = cols(mu_glm);
    matrix[N,K] rtn[2];
    
    for(n in 1:N){
      vector[K] z[N];
      real prev = 0;
      for(k in 1:K){
        if(n_thr[k] == 1){ //binary
          if(y[n,k] == 99){ // 99 is missing
            z[n,k] = L[k,k]*inv_Phi(u[n,k]);
            rtn[1][n,k] = z[n,k]; // here z is unconstrained
            rtn[2][n,k] = 0; // no adjustment because no transformation
          } else if(y[n,k] == 1){
            real bound = Phi(-(mu_glm[n,k] + prev) / L[k,k]);
            real t = bound + (1 - bound)*u[n,k];
            z[n,k] = inv_Phi(t);
            rtn[1][n,k] =  z[n,k]; // latent variable
            rtn[2][n,k] = log1m(bound); // adjustment
          } else{
            real bound = Phi(-(mu_glm[n,k] + prev) / L[k,k]);
            real t = bound*u[n,k];
            z[n,k] = inv_Phi(t);
            rtn[1][n,k] =  z[n,k]; // latent variable
            rtn[2][n,k] = log(bound); // adjustment
          }
        } else if(n_thr[k] == 12){ // dentition
          if(y[n,k] == 99){ // missing
            z[n,k] = L[k,k]*inv_Phi(u[n,k]);
            rtn[1][n,k] = z[n,k]; // here z is unconstrained
            rtn[2][n,k] = 0;
          } else if(y[n,k] == 1){ // bound
            real ub = Phi((tau_teeth[1+k-8,1] - (mu_glm[n,k] + prev)) / L[k,k]);
            real t = ub*u[n,k];
            z[n,k] = inv_Phi(t);
            rtn[1][n,k] =  z[n,k]; // latent variable
            rtn[2][n,k] = log(ub); // adjustment
          } else if (y[n,k] == n_thr[k] + 1){ // bound
            real lb = Phi((tau_teeth[1+k-8,n_thr[k]] - (mu_glm[n,k] + prev)) / L[k,k]);
            real t = lb + (1 - lb)*u[n,k];
            z[n,k] = inv_Phi(t);
            rtn[1][n,k] =  z[n,k]; // latent variable
            rtn[2][n,k] = log1m(lb); // adjustment
          } else{ // between bounds
            real lb = Phi((tau_teeth[1+k-8,y[n,k] - 1] - (mu_glm[n,k] + prev)) / L[k,k]);
            real ub = Phi((tau_teeth[1+k-8,y[n,k]] - (mu_glm[n,k] + prev)) / L[k,k]);
            real t = lb + (ub-lb)*u[n,k];
            z[n,k] = inv_Phi(t);
            rtn[1][n,k] =  z[n,k]; // latent variable
            rtn[2][n,k] = log(ub-lb); // adjustment
          }
        } else if(n_thr[k] == 6){ // 7-stage variables
          if(y[n,k] == 99){ // missing
            z[n,k] = L[k,k]*inv_Phi(u[n,k]);
            rtn[1][n,k] = z[n,k]; // here z is unconstrained
            rtn[2][n,k] = 0;
          } else if(y[n,k] == 1){ // bound
            real ub = Phi((tau_seven_stage[1+k-24,1] - (mu_glm[n,k] + prev)) / L[k,k]);
            real t = ub*u[n,k];
            z[n,k] = inv_Phi(t);
            rtn[1][n,k] =  z[n,k]; // latent variable
            rtn[2][n,k] = log(ub); // adjustment
          } else if (y[n,k] == n_thr[k] + 1){ // bound
            real lb = Phi((tau_seven_stage[1+k-24,n_thr[k]] - (mu_glm[n,k] + prev)) / L[k,k]);
            real t = lb + (1 - lb)*u[n,k];
            z[n,k] = inv_Phi(t);
            rtn[1][n,k] =  z[n,k]; // latent variable
            rtn[2][n,k] = log1m(lb); // adjustment
          } else{ // between bounds
            real lb = Phi((tau_seven_stage[1+k-24,y[n,k] - 1] - (mu_glm[n,k] + prev)) / L[k,k]);
            real ub = Phi((tau_seven_stage[1+k-24,y[n,k]] - (mu_glm[n,k] + prev)) / L[k,k]);
            real t = lb + (ub-lb)*u[n,k];
            z[n,k] = inv_Phi(t);
            rtn[1][n,k] =  z[n,k]; // latent variable
            rtn[2][n,k] = log(ub-lb); // adjustment
          }
        } else if(n_thr[k] == 8){ // carpals
          if(y[n,k] == 99){ // missing
            z[n,k] = L[k,k]*inv_Phi(u[n,k]);
            rtn[1][n,k] = z[n,k]; // here z is unconstrained
            rtn[2][n,k] = 0;
          } else if(y[n,k] == 1){ // bound
            real ub = Phi((tau_carpals[1] - (mu_glm[n,k] + prev)) / L[k,k]);
            real t = ub*u[n,k];
            z[n,k] = inv_Phi(t);
            rtn[1][n,k] =  z[n,k]; // latent variable
            rtn[2][n,k] = log(ub); // adjustment
          } else if (y[n,k] == n_thr[k] + 1){ // bound
            real lb = Phi((tau_carpals[n_thr[k]] - (mu_glm[n,k] + prev)) / L[k,k]);
            real t = lb + (1 - lb)*u[n,k];
            z[n,k] = inv_Phi(t);
            rtn[1][n,k] =  z[n,k]; // latent variable
            rtn[2][n,k] = log1m(lb); // adjustment
          } else{ // between bounds
            real lb = Phi((tau_carpals[y[n,k] - 1] - (mu_glm[n,k] + prev)) / L[k,k]);
            real ub = Phi((tau_carpals[y[n,k]] - (mu_glm[n,k] + prev)) / L[k,k]);
            real t = lb + (ub-lb)*u[n,k];
            z[n,k] = inv_Phi(t);
            rtn[1][n,k] =  z[n,k]; // latent variable
            rtn[2][n,k] = log(ub-lb); // adjustment
          }
        } else if(n_thr[k] == 7){ // tarsals
          if(y[n,k] == 99){ // missing
            z[n,k] = L[k,k]*inv_Phi(u[n,k]);
            rtn[1][n,k] = z[n,k]; // here z is unconstrained
            rtn[2][n,k] = 0;
          } else if(y[n,k] == 1){ // bound
            real ub = Phi((tau_tarsals[1] - (mu_glm[n,k] + prev)) / L[k,k]);
            real t = ub*u[n,k];
            z[n,k] = inv_Phi(t);
            rtn[1][n,k] =  z[n,k]; // latent variable
            rtn[2][n,k] = log(ub); // adjustment
          } else if (y[n,k] == n_thr[k] + 1){ // bound
            real lb = Phi((tau_tarsals[n_thr[k]] - (mu_glm[n,k] + prev)) / L[k,k]);
            real t = lb + (1 - lb)*u[n,k];
            z[n,k] = inv_Phi(t);
            rtn[1][n,k] =  z[n,k]; // latent variable
            rtn[2][n,k] = log1m(lb); // adjustment
          } else{ // between bounds
            real lb = Phi((tau_tarsals[y[n,k] - 1] - (mu_glm[n,k] + prev)) / L[k,k]);
            real ub = Phi((tau_tarsals[y[n,k]] - (mu_glm[n,k] + prev)) / L[k,k]);
            real t = lb + (ub-lb)*u[n,k];
            z[n,k] = inv_Phi(t);
            rtn[1][n,k] =  z[n,k]; // latent variable
            rtn[2][n,k] = log(ub-lb); // adjustment
          }
        } else{ // pelvis
          if(y[n,k] == 99){ // missing
            z[n,k] = L[k,k]*inv_Phi(u[n,k]);
            rtn[1][n,k] = z[n,k]; // here z is unconstrained
            rtn[2][n,k] = 0;
          } else if(y[n,k] == 1){ // bound
            real ub = Phi((tau_pelvis[1+k-42,1] - (mu_glm[n,k] + prev)) / L[k,k]);
            real t = ub*u[n,k];
            z[n,k] = inv_Phi(t);
            rtn[1][n,k] =  z[n,k]; // latent variable
            rtn[2][n,k] = log(ub); // adjustment
          } else if (y[n,k] == n_thr[k] + 1){ // bound
            real lb = Phi((tau_pelvis[1+k-42,n_thr[k]] - (mu_glm[n,k] + prev)) / L[k,k]);
            real t = lb + (1 - lb)*u[n,k];
            z[n,k] = inv_Phi(t);
            rtn[1][n,k] =  z[n,k]; // latent variable
            rtn[2][n,k] = log1m(lb); // adjustment
          } else{ // between bounds
            real lb = Phi((tau_pelvis[1+k-42,y[n,k] - 1] - (mu_glm[n,k] + prev)) / L[k,k]);
            real ub = Phi((tau_pelvis[1+k-42,y[n,k]] - (mu_glm[n,k] + prev)) / L[k,k]);
            real t = lb + (ub-lb)*u[n,k];
            z[n,k] = inv_Phi(t);
            rtn[1][n,k] =  z[n,k]; // latent variable
            rtn[2][n,k] = log(ub-lb); // adjustment
          }
        }
        if(k < K){
          prev = L[k+1, 1:k]*head(z[n],k);
        }
      // Jacobian adjustments imply z is truncated standard normal
      // thus utility --- mu + L_Omega * z --- is truncated multivariate normal
      }
    }
    
    return rtn;
  }
  
}
data{
  
// The ordinal and continuous data are included separately. This is to facilitate
// ease of each individual marginal function and to handle missing data in both
// cases
  
  int<lower=1> N; // total sample size
  int<lower=1> J; // total number of continuous responses
  int<lower=1> K; // total number of ordinal responses
  real<lower=0> X[N]; // chronological age (independent variable)
  
  //ordinal
  
  int y_ord[N,K]; // 2D array of integers n obs x k ordinal responses
  int thresholds[K]; // vector of integer values signifying number of cutpoints per response
  
  // continuous
  
  int<lower=0> complete_J; // integer of non-missing values in continuous data
  int<lower=0> miss_J; // integer of missing values in continuous data
  real y_cont[complete_J];   // real vector of non-missing values
  int ind_pres[complete_J, 2];     // array matrix (row, col) of non-missing value indices
  int ind_miss[miss_J, 2];// array matrix (row, col) of missing value indices
}
transformed data{
  // The purpose of this variable is simply to keep the responses organized for the
  // model. It is not necessary for model construction, but keeps things in order
  
  int order[K+J]; // order of responses with ordinal first (binary, poly, continuous)
  for(i in 1:K+J){
    order[i] = i;
  }
}
parameters{
  // Like the data block, I separate the parameters here by data type for ease of
  // interpretation
  
  cholesky_factor_corr[K+J] L; // cholesky factor of the correlation matrix
  
  // ordinal
  
  vector<lower=0>[K] o1; // parameter of ordinal mean function per response
  ordered[12]tau_teeth[16]; //cutpoints for dentition
  ordered[6] tau_seven_stage[16]; // cutpoints for 7-stage responses (includes tarsals)
  ordered[8] tau_carpals; // cutpoints for carpals
  ordered[7] tau_tarsals;
  ordered[2] tau_pelvis[2]; // cutpoints for pelvic fusion (excluding IC_EF)
  real<lower=0, upper=1> u[N,K]; // nuisance that absorbs inequality constraints in GHK and tMVN
  
  // continuous
  real ymiss_J[miss_J]; // array of missing data parameters in continuous data
  vector<lower=0>[J] c1; // first parameter of continuous mean function
  vector<lower=0>[J] c2; // second parameter of continuous mean function
  vector[J] c3; // third parameter of continuous mean function
  vector<lower=0>[J] k1; // first parameter of continuous noise function
  vector<lower=0>[J] k2; // second parameter of continuous noise function
}
model{
  // This is a rather involved model block where I construct the mean and SD of
  // both data types and put together the matrix for continuous responses using the
  // known data y_cont and marginally predicted parameter ymiss_J. This is what goes
  // into the marginal function. Note, technically 'u' is the parameter also used
  // for missing data in the ordinal model, but is left unconstrained.
  // I make the calculations here instead of the transformed parameters block to save memory
  
  matrix[N,J] mu_J; // continuous mean
  matrix[N,J] sd_J; // continuous sd
  matrix[N,J] y_cont_full;   // The "data" with predicted missing values
  matrix[N,K] mu_K; // ordinal mean
  
  for(i in 1:N){
    
    for(j in 1:J){
      mu_J[i,j] = c2[j]*X[i]^c1[j] + c3[j];
      sd_J[i,j] = k1[j]*(1+k2[j]*X[i]);
    }
    for(k in 1:K){
      mu_K[i,k] = X[i]^o1[k];
    }
    
  }
  
  for(n in 1:complete_J) {
    y_cont_full[ind_pres[n,1]][ind_pres[n,2]] = y_cont[n];
  }
  
  // Fill the rest of y_cont_full with missing value "parameters"
  for(i in 1:miss_J){
    y_cont_full[ind_miss[i,1]][ind_miss[i,2]] = ymiss_J[i];
  }
  
  // PRIORS //
  
  L ~ lkj_corr_cholesky(4); // 
  
  // Continuous parameters
  c1 ~ normal(0,1);
  c2 ~ normal(0,1);
  c3 ~ normal(0,1);
  k1 ~ cauchy(0,1);
  k2 ~ normal(0,1);
  
  // Ordinal parameters
  o1 ~ normal(0,5);
  for(i in 1:16){
    for(t in 1:12){
      tau_teeth[i,t] ~ normal(t+0.5,1); // dentition cutpoint prior
    }
  }
  for(i in 1:16){
    for(t in 1:6){
      tau_seven_stage[i,t] ~ normal(t+0.5,1); // 7-stage cutpoint prior
    }
  }
  for(i in 1:1){
    for(t in 1:2){
      tau_pelvis[i,t] ~ normal(t+0.5,1); // pelvis cutpoint prior
    }
  }
  for(t in 1:8){
    tau_carpals[t] ~ normal(t+0.5,1); // carpal cutpoint prior
  }
  for(t in 1:7){
    tau_tarsals[t] ~ normal(t+0.5,1); // carpal cutpoint prior
  }
  
  // LIKELIHOOD //
  
  target += centered_gaussian_copula_cholesky_({probit_marginal(y_ord, mu_K, u, thresholds, tau_teeth, tau_seven_stage, tau_pelvis, tau_carpals, tau_tarsals,L), continuous_marginal(y_cont_full, mu_J, sd_J)}, L, order);
}
generated quantities{
  // Generally the parameter of interest from a Gaussian copula in the correlation structure Rho
  // I reconstruct here to be used downstream
  
  corr_matrix[K+J] cor_mat = multiply_lower_tri_self_transpose(L);

}
1 Like

Do you think there should be separate nuisance parameters (i.e. index like u[n,k,threshold] rather than u[n,k]? In my gold standard model I didn’t but maybe there should be…

Hi cwolfe,

for missing data is z equal to inv_Phi(u) or is it L*inv_Phi(u)? I dont want to * impute * missing data just account for it

Hello @CerulloE ,

Fun fact, initial construction of my model had z = inv_Phi(u). I have modified my code slightly for efficiency. What we really are estimating is z, which is from a standard normal distribution, and then the latent variable y star, which is the latent (truncated) multivariate normal variable. I have collapsed those steps. Let me share a few lines from the email correspondence I shared with Ben Goodrich:

  1. In relation to you comment:
    “I am not seeing why the missing responses would be equal to inv_Phi(u). It seems like they should be elements of the vector formed by post-multiplying the Cholesky factor by a vector of possibly truncated standard normals obtained by pushing uniforms through the quantile function of the standard normal. Or if the missing responses are discrete, discretizing the result with the cutpoints.”

  2. On how to set it up:
    If you have in Stan
    vector[K] u; // fill this in with GHK logic
    then
    vector[K] z = inv_Phi(u); // standard normal but possibly truncated
    vector[K] y_star = L * z; // multivariate normal but possibly truncated
    Suppose the first element of y was missing, then it would
    y_star[1] if it is continuous
    y_star[1] > 0 if it is binary
    k if it is ordinal and y_star[1] is greater than the k-th cutpoint but not greater than the k+1th cutpoint

  3. Because the above assumes you want to discretize the latent missing variable (i.e., impute), here is the justification that the bottom step is unnecessary:

  • If the u values that correspond to missing y values are included in the parameters block, then they just have a standard uniform prior. There is no truncation and no Jacobian for them. I don’t think you can integrate them out analytically but you don’t have to do much of anything with them if you don’t want to. They just affect the joint posterior distribution of all the unknown parameters.

  • It’s z = L * inv_phi(u) but the idea of missing values being parameters or functions of parameters is sound.

I hope all of this makes sense. Essentially Ben is suggesting that even in the missing data case we are accounting for the multivariate correlation structure by post multiplying by the cholesky factor. That is the latent missing variable. Unfortunately, I have yet to find a tractable way to integrate out this variable, but as Ben says, this type of approach is really no different than Stan’s basic idea to treat missing variables as other RVs.

As an aside, well I have got this code to work in a “traditional” multivariate probit case, I have adjusted the “ordinal marginal” function from above. I will share that once I run through some more tests, but essentially I have taken the code underlying the univariate ordered_probit lpmf in Stan math and adapted it to the function - which in my case makes better sense because in a copula I am modelling each univariate marginal, so the GHK construction is unnecessary.

im confused… you say z=inv_Phi(u )on pint (2) but then you say z = L * inv_phi(u) on your last bullet point… what is y_star? the jacobian adjustment? shouldnt this be 0?

In my initial code, I have combined expressions for efficiency - hence the L * inv_Phi(u).

In the messages I shared with you between Ben Goodrich and I, he completes this in 2 different steps.

The GHK algorithm simulates draws from a truncated MVN using draws from a univariate N. If we go back to his unpublished discussion on the tMVN maybe it makes better sense. tMVN.pdf (2.4 MB). First, z = inv_Phi(u). This is taking the inverse CDF of a uniform variate (u) leading to a standard normal variate.

Relatedly, we can write the MVN representation as y(star) = \mu + Lz(u). So the standard normal variate z is used with the cholesky factor L to get a truncated MVN value. Now, because the data is missing in your case, we just need to account for the fact the variable is a) possibly truncated, and b) is still arising from a tMVN. Hence the z*L. In my code, technically z == y_star because I’ve combined steps because I did not feel the need to write out z and y_star on separate lines of code.

Is it imputation per se? Sure. But they are better thought of as predictions of the marginalized model. If you wanted to get the discretized stages, you’d have to pull out Z (the inv_phi() expression) and use the logic highlighted in point 2 in the generated quantities block. Like stated above, I’m unsure if it is tractable to integrate/ marginalize out the missing data much like section 3.5 in the manual. Instead, this approach is more akin to Section 3.3 where the missing data are treated as a parameter / RV.

Oh and you are correct, there is no jacobian adjustment because there is no transformation occurring here with Phi. Hence why I have set rtn[2] to 0. In the pdf I shared you’ll notice Ben sets this to 1 and then takes the log - which is still 0.

1 Like

I see, thanks

BTW I might look at running these models in Mplus. Have you used it? it’s supposed to be much faster than stan and can do MV probit

Dont think it can do copulas though

Hi @harrelfe . Found my way to this thread thanks to the recent copula tutorial that was posted from @andrjohns.

I was wondering if you could elaborate this point. How exactly do copulas allow you to improve frequentist (or Bayesian) power?

Question question. In frequentist inference, an example of power gain is where you are considering multiplicity adjustment for a drug’s potential benefit on 2 outcomes. A multiplicity adjustment may be conservative (and hence power-losing) if the correlation of the two outcomes is not modeled correctly.

In the Bayesian setting, we may want posterior probabilities of unions or intersections of effects, e.g., Pr(the drug improves two outcomes) or Pr(the drug improves either outcome). If the Bayesian procedure acts as if the outcomes are independent, the probability of improvement on both outcomes will be too low, hurting Bayesian power.

2 Likes