Copula fit (finally!) and questions about downstream prediction

Summary
In this post I am doing 2 things:

  1. Providing a final update on a copula model that I have been working on for a year and has been greatly improved thanks to the community on here.

  2. Asking an additional question: How can I use the samples from the fit of this model for downstream prediction using fit$generated_quantities? (see below)

A Big Thank You
Before I get to my question, I want to send a huge shoutout to the community on here. I have skulked this forum for well over a year as I try to fit a Mixed Discrete and Continuous Gaussian Copula and I can finally say that I have been successful in such an endeavor. My full model is here: MDCG_V3.stan (49.0 KB). A smaller more condensed version is pasted below when I ask my prediction question. Essentially, I have a 54 dimension dataset with some continuous, normally distributed observed values y_c and some ordered probit observed values (latent continuous) y_o. These are both conditional on some positive, real independent variable x and linked with the correlation matrix \Sigma in a Gaussian Copula. There is missing data in the multivariate response vector that is accounted for in the model linked above. I discovered that given the complexity of the model, I needed tight priors to avoid inefficiency and BFMI issues. This was achieved by first fitting each marginal univariately and then taking the expectation of each parameter in that setting as a prior in the larger multivariate model. I then found the Cholesky parameter finicky to tune and did so until all R-hat values are 1.01 or below with appropriate ESS and no BFMI warnings. I did have to set max_treedepth to 15, but not change adapt_delta. I think the big reason for these tight priors is the missing data - I needed to make sure I had information to regularize this parameter or the sample would go off in any random direction. I have both recovered simulated parameters from this model (see smaller code chunk below) and done a posterior predictive check on every marginal in this large model. It took ~30 hours to fit with a dataset that is 1316 x 54 and 2000 warmup iterations and 4000 samples across 4 chains on an AMD Threadripper. Given the amount of missing data this is certainly a p>>n problem. I want to make it faster although I am unsure if any real speed up will be gained with either openCL or multi-threading (or how to prep it for multi-threading…). I hope these steps serve others in the future. I am indebted to all of the assistance from those on this forum.

Question
All of this said, I do have a question for the community. As I prep to write this model up I am looking at possible downstream applications. One of these is certainly the prediction of unobserved x (age) given observed y (continuous and ordinal outcomes). Is there a way to do this in the generated quantities block using the generated_quantities function in cmdstanr? If so, what is the best way to start? I have a held out test sample with observed y values, but unobserved x. I want to use the information I learned from the fit of the model to predict x. What is the best way to get started doing so? I post an abbreviated model below. I think this involves taking the cdf of the new observed y and then doing something with the quantile function, but admittedly I am lost as to the best way to start particularly because of the multivariate nature of response. I am also unsure what to do if I want to (or should?) include a prior on unobserved x to make this prediction.

Point of clarity: Currently I have p(y|\theta, x) where x is observed. Essentially, I am looking for p(x| \boldsymbol{\theta}, y) where \theta parameterizes both x and y and using the initial model parameters as starting point for estimation of x.

functions{
  
  // Gaussian Copula Log Probability Density
  
  // Gaussian Copula Log Probability Density
  
  real multi_normal_cholesky_copula_lpdf(matrix U, matrix L) {
    int N = rows(U);
    int J = cols(U);
    matrix[J, J] Gammainv = chol2inv(L);
    return -N * sum(log(diagonal(L))) - 0.5 * sum(add_diag(Gammainv, -1.0) .* crossprod(U));
  }
  
  // Prepare data for LPDF
  
  real centered_gaussian_copula_cholesky_(array[,] matrix marginals, matrix L) {
    // Extract dimensions of final outcome matrix
    int N = rows(marginals[1][1]);
    int J = rows(L);
    matrix[N, J] U;
  
    // Iterate through marginal arrays, concatenating the outcome matrices by column
    // and aggregating the log-likelihoods (from continuous marginals) and jacobian
    // adjustments (from discrete marginals)
    real adj = 0;
    int pos = 1;
    for (m in 1 : size(marginals)) {
      int curr_cols = cols(marginals[m][1]);
    
      U[ : , pos : (pos + curr_cols - 1)] = marginals[m][1];
    
      adj += sum(marginals[m][2]);
      pos += curr_cols;
    }
  
    // Return the sum of the log-probability for copula outcomes and jacobian adjustments
    return multi_normal_cholesky_copula_lpdf(U | L) + adj;
  }
  

  // Continuous Marginal Distribution (Normal)
  
  array[] matrix normal_marginal(array[] real y, array[] real mu, array[] real sigma, int N) {
    array[2] matrix[N, 1] rtn; // empty 2D array to return
    // Initialise the jacobian adjustments to zero, as vectorised lpdf will be used
    rtn[2] = rep_matrix(0, N, 1);

    for (n in 1 : N) {
      rtn[1][n, 1] = (y[n] - mu[n]) / sigma[n]; // center RV
      rtn[2][n, 1] = normal_lpdf(y[n] | mu[n], sigma[n]); // "jacobian"
    }
  return rtn;
  } 

  array[] matrix probit_marginal(array[] int y, array[] real mu_glm, array[] real u_raw, vector cutpoints) {
    int N = num_elements(mu_glm); // number of observations
    array[2] matrix[N, 1] rtn; // empty 2D array to return
    
    for(n in 1:N){
      int C = num_elements(cutpoints) + 1; // total number of ord categories
      if(y[n] == 99){ // missing data
        rtn[1][n,1] = inv_Phi(u_raw[n]); // missing RV
        rtn[2][n,1] = 0;
        } else if(y[n] == 1){ // lowest bound
        real bound = Phi((cutpoints[1] - mu_glm[n])); // data augmentation
        rtn[1][n,1] = inv_Phi((bound*u_raw[n])); // latent RV
        rtn[2][n,1] = log(bound); // jacobian
      } else if (y[n] == C){ // highest bound
        real bound = Phi((cutpoints[C - 1] - mu_glm[n])); // data augmentation
        rtn[1][n,1] = inv_Phi(bound + (1-bound)*u_raw[n]); // latent RV
        rtn[2][n,1] = log1m(bound); // jacobian
      } else { // in between 
        real ub = Phi((cutpoints[y[n]] - mu_glm[n])); // data augmentation
        real lb = Phi((cutpoints[y[n] - 1] - mu_glm[n])); // data augmentation
        rtn[1][n,1] = inv_Phi((lb + (ub-lb)*u_raw[n])); // latent RV
        rtn[2][n,1] = log(ub-lb); // jacobian
      }
    }
    return rtn;
  }

}
data{

  int N;
  int M;
  array[N] real x;

  array[N] real z1; // continuous
  array[N] int z2; // ordinal
  array[N] int z3; // ordinal

  int C_z2;
  int C_z3;

}
transformed data{

  int thresh_z2 = C_z2 - 1;
  int thresh_z3 = C_z3 - 1;

}
parameters{

  cholesky_factor_corr[M] L;

  real<lower=0> z1_constant;
  real<lower=0> z1_exponent;
  real<lower=0> z1_offset;
  real<lower=0> z1_noise1;
  real<lower=0> z1_noise2;

  real<lower=0> z2_constant;
  array[N] real<lower=0, upper=1> z2_u;
  ordered[thresh_z2] z2_t_pars;

  real<lower=0> z3_constant;
  array[N] real<lower=0, upper=1> z3_u;
  ordered[thresh_z3] z3_t_pars;  

}
transformed parameters{

  array[N] real z1_mean;
  array[N] real z1_sd;
  array[N] real z2_mean;
  array[N] real z3_mean;



  for(n in 1:N){

    z1_mean[n] = z1_constant*x[n]^z1_exponent + z1_offset;
    z1_sd[n] = z1_noise1*(1+x[n]*z1_noise2);
    z2_mean[n] = z2_constant*x[n];
    z3_mean[n] = z3_constant*x[n];

  }

}
model{

  L ~ lkj_corr_cholesky(20.0);
  
  z1_constant ~ normal(0,50);
  z1_exponent ~ normal(0,1);
  z1_offset ~ normal(0,50);
  z1_noise1 ~ normal(0,10);
  z1_noise2 ~ normal(0,1);

  z2_constant ~ normal(0,1);  
  for(i in 1:size(z2_t_pars)){
    z2_t_pars[i] ~ normal(i+1,1);
  }

  


  z3_constant ~ normal(0,1);
  for(i in 1:size(z3_t_pars)){
    z3_t_pars[i] ~ normal(i+1,1);
  }


  target += centered_gaussian_copula_cholesky_(
    {normal_marginal(z1, z1_mean, z1_sd, N),     
     probit_marginal(z2, z2_mean, z2_u, z2_t_pars),
     probit_marginal(z3, z3_mean, z3_u, z3_t_pars)}, L);

}
generated quantities{

  corr_matrix[M] corr_mat = multiply_lower_tri_self_transpose(L);

}

1 Like

Side note: I think that the two-phase process you outlined where marginal results are used to form priors for multivariate modeling will result in posterior distributions that are falsely narrow.

1 Like

This could be improved by assigning to a missing parameter. The approach
in Stans manual: 3.3 Sliced missing data | Stan User’s Guide
is a good one.

        rtn[1][n,1] = missing_parameter[?];
        rtn[2][n,1] = std_normal_lpdf(missing_parameter[?]);

It should lead to a reduction of max_treedepth and increase sampling performance.