Multivariate, hierarchical ordered probit mixture model - sharing nuisance parameters

CerulloE · November 9, 2020, 12:45am

I have developed some multivariate probit models - based on bgoodri’s parameterization of the MVP - to meta-analyse diagnostic test accuracy data without a gold standard, where studies report data at different thresholds.

For my datasets which do not have patient-level covariates, individuals with the same test response patterns contribute equally to the likelihood. This means I can assign the same latent vector across these individuals, and hence they can also share the same nuisance parameters and increase efficiency. These models are really slow in Stan, so this is important, and i don’t yet have the expertise to code my own sampler for a model this complex (such as calibrated data augmentation Gibbs sampling, which would be much faster)

Sharing the same nuisance parameters works fine with no correlation, and gives the same answers as the version which estimates a separate nuisance parameter for every subject. The problem is, sharing the same nuisance parameters doesn’t work when I model the correlation

I’ll show the code for models which assume correlation = 0 first. The difference is in the parameters “u_d” and “u_nd”. Basically, the model below (model #1 ) generates a nuisance parameter for every observation, whereas model #2 only generates num_patterns*nt nuisance parameters (where nt = numer of tests = dimension, and num_patterns = total number of possible test patterns).

Model 1:


data {
          int num_binary_tests;
          int<lower=1> nt; // number of tests
          int<lower=1> NS;	// total # of studies
          int<lower=1> ns[NS]; // number of individuals in each study 
          int<lower=0> y[max(ns), nt, NS]; // N individuals and nt tests, NS studies 
          int num_patterns;
          int<lower=0> pa[max(ns), NS]; 
          int numg;
          int num_thresh_params[NS];
          int max_cutpoint;
          int cutpoints[NS, max_cutpoint];
          int twoway_thr[NS];
          int<lower=0, upper=1> primary_care_ind[NS]; //indicator for whether the study comes from primary care or not
          int r[NS, 4, choose(nt,2)];
          int num_refs; // number of reference tests
          int ref[NS]; // ref. test used in each of the NS studies
          int re[NS];
}

transformed data {
              int r2[choose(nt,2), NS, 4];
  
       for (a in 1:4)
          for (s in 1:NS)
            r2[1,s,a] = r[s,a,1];
}

parameters {
              ordered[2] a1_m_raw[num_refs]; // Se > Fp for each reference test
              ordered[2] a2_m_raw;   // don't order this due to potential thr. interction
              ordered[max_cutpoint] C_dm;
              ordered[max_cutpoint] C_ndm;
              vector<lower=0>[2] sd1;  
              real<lower=0,upper=1> u_d[max(ns),nt, NS]; // nuisance that absorbs inequality constraints
              real<lower=0,upper=1> u_nd[max(ns),nt, NS]; // nuisance that absorbs inequality constraints
              cholesky_factor_corr[2] L_Omega_bs1;
              vector[2] z2[NS];
              vector[NS] z3;
              vector[NS] z4;
              real<lower=0> sd3;
              real<lower=0, upper=1> p[NS]; 
              vector[2] b_primary_care; // coefficients for primary care (1 for MMSE Se and 1 for MMSE Sp)
}

transformed parameters {
              vector[2] mu[num_refs + 1];  
              matrix[2, nt] nu[NS];
              matrix[NS, num_patterns] log_lik_i; 
              vector[max_cutpoint] C_d[NS];
              vector[max_cutpoint] C_nd[NS];
              vector[num_patterns] lp1;
              vector[num_patterns] lp0;
              vector[nt] z_d[num_patterns];
              vector[nt] z_nd[num_patterns];
              vector[nt] y1d[num_patterns];
              vector[nt] y1nd[num_patterns];

                          // index test (MMSE)
                          mu[num_refs + 1, 1] =  a2_m_raw[1];
                          mu[num_refs + 1, 2] =  a2_m_raw[2];
                          
              for (s in 1:NS) {
                          // ref test (Se > Fp)
                          mu[ref[s], 1] =  a1_m_raw[ref[s], 2];   
                          mu[ref[s], 2] =  a1_m_raw[ref[s], 1];

                 // between-study model
                     // fixed ref. tests
                  nu[s, 1:2 , 1] =  mu[ref[s], 1:2]  ;  


                      // index test random effect - w/ covariate for primary care
                   nu[s, 1:2 , 2] =  mu[num_refs + 1 , 1:2 ]   + primary_care_ind[s]*b_primary_care[1:2] 
                                    + diag_pre_multiply(sd1, L_Omega_bs1) * z2[s,1:2] ; 

                for (i in 1:max_cutpoint) { 
                   C_d[s,i] = C_dm[i]  ; 
                   C_nd[s,i] = C_ndm[i]; 
                }
              }

 // Parameters for likelihood function. Based on code upoaded by Ben Goodrich which uses the 
                      // GHK algorithm for generating TruncMVN. See:
                      // https://github.com/stan-dev/example-models/blob/master/misc/multivariate-probit/probit-multi-good.stan#L11
                      {
                 for (s in 1:NS) {
                    for (n in 1:ns[s]) {
                      for (i in 1:nt) { // loop over each test
                           real u_d1 = u_d[n,i,s];
                           real u_nd1 = u_nd[n,i,s];
                           real bound_d_bin     = phi_logit_approx(    - (nu[s,1,i])   ); // se
                           real bound_nd_bin    = phi_logit_approx(    - (nu[s,2,i])   );  // fp
                           vector[ num_thresh_params[s] + 1 ] bound_d;
                           vector[ num_thresh_params[s] + 1 ] bound_nd;
                           bound_d[ num_thresh_params[s] + 1 ] = 1;
                           bound_nd[ num_thresh_params[s] + 1 ] = 1;
                           bound_d[1] =  phi_logit_approx(  -( nu[s,1,i] -  C_d[s,cutpoints[s,1]]));
                           bound_nd[1] = phi_logit_approx(  -( nu[s,2,i] -  C_nd[s,cutpoints[s,1]]));
                         
                           for (j in 2:(num_thresh_params[s])) {
                             bound_d[j] =  phi_logit_approx(  -( nu[s,1,i] -  C_d[s,cutpoints[s,j]] ) );
                             bound_nd[j] = phi_logit_approx(  -( nu[s,2,i] -  C_nd[s,cutpoints[s,j]]  )  );
                           }
                           for (j in 2:(num_thresh_params[s] + 1 )) {
                                  if (y[n,i,s] == j) {
                                    z_d[pa[n,s],i]   = inv_phi_logit_approx(bound_d[j-1]  + (bound_d[j] - bound_d[j-1])* u_d1);      
                                    z_nd[pa[n,s],i]  = inv_phi_logit_approx(bound_nd[j-1] + (bound_nd[j] - bound_nd[j-1])* u_nd1);    
                                    y1d[pa[n,s],i]   = log(bound_d[j] - bound_d[j-1]);  // Jacobian adjustment
                                    y1nd[pa[n,s],i]  = log(bound_nd[j] - bound_nd[j-1]); // Jacobian adjustment
                                  }
                                }
                                if (y[n,i,s] == 1 && i > num_binary_tests ) { 
                                  z_d[pa[n,s],i]   = inv_phi_logit_approx(bound_d[1]* u_d1);
                                  z_nd[pa[n,s],i]  = inv_phi_logit_approx(bound_nd[1]* u_nd1);
                                  y1d[pa[n,s],i]   = log(bound_d[1]); // Jacobian adjustment
                                  y1nd[pa[n,s],i]  = log(bound_nd[1]); // Jacobian adjustment
                                }
                                if (y[n,i,s] == 1 && i <= num_binary_tests ) {
                                  z_d[pa[n,s],i]   = inv_phi_logit_approx(bound_d_bin + (1 - bound_d_bin)*u_d1);      
                                  z_nd[pa[n,s],i]  = inv_phi_logit_approx(bound_nd_bin + (1 - bound_nd_bin)*u_nd1);    
                                  y1d[pa[n,s],i]   = log1m(bound_d_bin);  // Jacobian adjustment
                                  y1nd[pa[n,s],i]  = log1m(bound_nd_bin); // Jacobian adjustment
                                }
                               if (y[n,i,s] == 0) { 
                                  z_d[pa[n,s],i]   = inv_phi_logit_approx(bound_d_bin*u_d1);
                                  z_nd[pa[n,s],i]  = inv_phi_logit_approx(bound_nd_bin*u_nd1);
                                  y1d[pa[n,s],i]   = log(bound_d_bin);  // Jacobian adjustment
                                  y1nd[pa[n,s],i]  = log(bound_nd_bin); // Jacobian adjustment
                                }
                         }
                            lp1[pa[n,s]]  = sum(y1d[pa[n,s],])  +   bernoulli_lpmf(1 | p[s]); 
                            lp0[pa[n,s]]  = sum(y1nd[pa[n,s],]) +   bernoulli_lpmf(0 | p[s]);
                            log_lik_i[n,s] =  log_sum_exp(lp1[pa[n,s]] , lp0[pa[n,s]] ); 
                        }
                       }
                     }
}

model {
      p ~ beta(1,5); 

      b_primary_care ~ std_normal();

      C_dm  ~ induced_dirichlet(rep_vector(1, max_cutpoint + 1), 0);
      C_ndm ~ induced_dirichlet(rep_vector(1, max_cutpoint + 1), 0);

      z3 ~ std_normal();
      z4 ~ std_normal();
      sd3 ~ std_normal();

            sd1  ~   std_normal();
            L_Omega_bs1 ~ lkj_corr_cholesky(2);


       for (j in 1:2) 
           a1_m_raw[,j]  ~ std_normal(); 

           a2_m_raw  ~ std_normal(); 

        for (s in 1:NS) 
            z2[s, ] ~ std_normal();  
          

          for (s in 1:NS) { // likelihood
            for (n in 1:ns[s]) 
              target +=  log_lik_i[n,s]; 
          }
   
}

Is equivalent to the this model below (model # 2) which generates u_d and u_nd vectors for every individual

Model 2:



data {
          int num_binary_tests;
          int<lower=1> nt; // number of tests
          int<lower=1> NS;	// total # of studies
          int<lower=1> ns[NS]; // number of individuals in each study 
          int<lower=0> y[max(ns), nt, NS]; // N individuals and nt tests, NS studies 
          int num_patterns;
          int<lower=0> pa[max(ns), NS]; 
          int numg;
          int num_thresh_params[NS];
          int max_cutpoint;
          int cutpoints[NS, max_cutpoint];
          int twoway_thr[NS];
          int<lower=0, upper=1> primary_care_ind[NS]; //indicator for whether the study comes from primary care or not
          int r[NS, 4, choose(nt,2)];
          int num_refs; // number of reference tests
          int ref[NS]; // ref. test used in each of the NS studies
          int re[NS];
}

transformed data {
              int r2[choose(nt,2), NS, 4];
  
       for (a in 1:4)
          for (s in 1:NS)
            r2[1,s,a] = r[s,a,1];
}

parameters {
              ordered[2] a1_m_raw[num_refs]; // Se > Fp for each reference test
              ordered[2] a2_m_raw;   // don't order this due to potential thr. interction
              ordered[max_cutpoint] C_dm;
              ordered[max_cutpoint] C_ndm;
              vector<lower=0>[2] sd1;  
              real<lower=0,upper=1> u_d[num_patterns,nt]; // nuisance that absorbs inequality constraints
              real<lower=0,upper=1> u_nd[num_patterns,nt]; // nuisance that absorbs inequality constraints
              cholesky_factor_corr[2] L_Omega_bs1;
              vector[2] z2[NS];
              vector[NS] z3;
              vector[NS] z4;
              real<lower=0> sd3;
              real<lower=0, upper=1> p[NS]; 
              vector[2] b_primary_care; // coefficients for primary care (1 for MMSE Se and 1 for MMSE Sp)
}

transformed parameters {
              vector[2] mu[num_refs + 1];  
              matrix[2, nt] nu[NS];
              matrix[NS, num_patterns] log_lik_i; 
              vector[max_cutpoint] C_d[NS];
              vector[max_cutpoint] C_nd[NS];
              vector[num_patterns] lp1;
              vector[num_patterns] lp0;
              vector[nt] z_d[num_patterns];
              vector[nt] z_nd[num_patterns];
              vector[nt] y1d[num_patterns];
              vector[nt] y1nd[num_patterns];

                          // index test (MMSE)
                          mu[num_refs + 1, 1] =  a2_m_raw[1];
                          mu[num_refs + 1, 2] =  a2_m_raw[2];
                          
              for (s in 1:NS) {
                          // ref test (Se > Fp)
                          mu[ref[s], 1] =  a1_m_raw[ref[s], 2];   
                          mu[ref[s], 2] =  a1_m_raw[ref[s], 1];

                 // between-study model
                     // fixed ref. tests
                  nu[s, 1:2 , 1] =  mu[ref[s], 1:2]  ;  


                      // index test random effect - w/ covariate for primary care
                   nu[s, 1:2 , 2] =  mu[num_refs + 1 , 1:2 ]   + primary_care_ind[s]*b_primary_care[1:2] 
                                    + diag_pre_multiply(sd1, L_Omega_bs1) * z2[s,1:2] ; 

                for (i in 1:max_cutpoint) { 
                   C_d[s,i] = C_dm[i]  ; 
                   C_nd[s,i] = C_ndm[i]; 
                }
              }

 // Parameters for likelihood function. Based on code upoaded by Ben Goodrich which uses the 
                      // GHK algorithm for generating TruncMVN. See:
                      // https://github.com/stan-dev/example-models/blob/master/misc/multivariate-probit/probit-multi-good.stan#L11
                      {
                 for (s in 1:NS) {
                    for (n in 1:ns[s]) {
                      for (i in 1:nt) { // loop over each test
                           real u_d1 = u_d[pa[n,s],i];
                           real u_nd1 = u_nd[pa[n,s],i];
                           real bound_d_bin     = phi_logit_approx(    - (nu[s,1,i])   ); // se
                           real bound_nd_bin    = phi_logit_approx(    - (nu[s,2,i])   );  // fp
                           vector[ num_thresh_params[s] + 1 ] bound_d;
                           vector[ num_thresh_params[s] + 1 ] bound_nd;
                           bound_d[ num_thresh_params[s] + 1 ] = 1;
                           bound_nd[ num_thresh_params[s] + 1 ] = 1;
                           bound_d[1] =  phi_logit_approx(  -( nu[s,1,i] -  C_d[s,cutpoints[s,1]]));
                           bound_nd[1] = phi_logit_approx(  -( nu[s,2,i] -  C_nd[s,cutpoints[s,1]]));
                         
                           for (j in 2:(num_thresh_params[s])) {
                             bound_d[j] =  phi_logit_approx(  -( nu[s,1,i] -  C_d[s,cutpoints[s,j]] ) );
                             bound_nd[j] = phi_logit_approx(  -( nu[s,2,i] -  C_nd[s,cutpoints[s,j]]  )  );
                           }
                           for (j in 2:(num_thresh_params[s] + 1 )) {
                                  if (y[n,i,s] == j) {
                                    z_d[pa[n,s],i]   = inv_phi_logit_approx(bound_d[j-1]  + (bound_d[j] - bound_d[j-1])* u_d1);      
                                    z_nd[pa[n,s],i]  = inv_phi_logit_approx(bound_nd[j-1] + (bound_nd[j] - bound_nd[j-1])* u_nd1);    
                                    y1d[pa[n,s],i]   = log(bound_d[j] - bound_d[j-1]);  // Jacobian adjustment
                                    y1nd[pa[n,s],i]  = log(bound_nd[j] - bound_nd[j-1]); // Jacobian adjustment
                                  }
                                }
                                if (y[n,i,s] == 1 && i > num_binary_tests ) { 
                                  z_d[pa[n,s],i]   = inv_phi_logit_approx(bound_d[1]* u_d1);
                                  z_nd[pa[n,s],i]  = inv_phi_logit_approx(bound_nd[1]* u_nd1);
                                  y1d[pa[n,s],i]   = log(bound_d[1]); // Jacobian adjustment
                                  y1nd[pa[n,s],i]  = log(bound_nd[1]); // Jacobian adjustment
                                }
                                if (y[n,i,s] == 1 && i <= num_binary_tests ) {
                                  z_d[pa[n,s],i]   = inv_phi_logit_approx(bound_d_bin + (1 - bound_d_bin)*u_d1);      
                                  z_nd[pa[n,s],i]  = inv_phi_logit_approx(bound_nd_bin + (1 - bound_nd_bin)*u_nd1);    
                                  y1d[pa[n,s],i]   = log1m(bound_d_bin);  // Jacobian adjustment
                                  y1nd[pa[n,s],i]  = log1m(bound_nd_bin); // Jacobian adjustment
                                }
                               if (y[n,i,s] == 0) { 
                                  z_d[pa[n,s],i]   = inv_phi_logit_approx(bound_d_bin*u_d1);
                                  z_nd[pa[n,s],i]  = inv_phi_logit_approx(bound_nd_bin*u_nd1);
                                  y1d[pa[n,s],i]   = log(bound_d_bin);  // Jacobian adjustment
                                  y1nd[pa[n,s],i]  = log(bound_nd_bin); // Jacobian adjustment
                                }
                         }
                            lp1[pa[n,s]]  = sum(y1d[pa[n,s],])  +   bernoulli_lpmf(1 | p[s]); 
                            lp0[pa[n,s]]  = sum(y1nd[pa[n,s],]) +   bernoulli_lpmf(0 | p[s]);
                            log_lik_i[s,pa[n,s]] =  log_sum_exp(lp1[pa[n,s]] , lp0[pa[n,s]] ); 
                        }
                       }
                     }
}

model {
      p ~ beta(1,5); 

      b_primary_care ~ std_normal();

      C_dm  ~ induced_dirichlet(rep_vector(1, max_cutpoint + 1), 0);
      C_ndm ~ induced_dirichlet(rep_vector(1, max_cutpoint + 1), 0);

      z3 ~ std_normal();
      z4 ~ std_normal();
      sd3 ~ std_normal();

            sd1  ~   std_normal();
            L_Omega_bs1 ~ lkj_corr_cholesky(2);


       for (j in 1:2) 
           a1_m_raw[,j]  ~ std_normal(); 

           a2_m_raw  ~ std_normal(); 

        for (s in 1:NS) 
            z2[s, ] ~ std_normal();  
          

          for (s in 1:NS) { // likelihood
            for (n in 1:ns[s]) 
              target +=  log_lik_i[s,pa[n,s]]; 
          }
}

Both of these work and give the same answers.

The problem is, when I allow for correlation, only the first version (which is less efficient) works. If i try to share nuisance parameters it doesn’t work - it runs quickly with 100% divergent transitions and the trace plots show that the parameters only explore a very narrow space, and some of them almost look like straight lines. So the model below (model # 3) does not work:

functions {
          // Function to do partial pooling on correlation matrices
          // See https://discourse.mc-stan.org/t/hierarchical-prior-for-partial-pooling-on-correlation-matrices/4852/27
          matrix convex_combine_cholesky(matrix global_chol_cor, matrix local_chol_cor, real alpha){
            int dim = rows(local_chol_cor);
            int global_dim = rows(global_chol_cor);
            matrix[global_dim,global_dim] global_cor = multiply_lower_tri_self_transpose(global_chol_cor);
            matrix[dim,dim] local_cor = multiply_lower_tri_self_transpose(local_chol_cor);
            matrix[dim,dim] global_cor_use;
            matrix[dim,dim] combined_chol_cor;
            
            if(dim < rows(global_cor)){
              global_cor_use = global_cor[1:dim,1:dim];
            } else {
              global_cor_use = global_cor;
            }
            combined_chol_cor = cholesky_decompose((1 - alpha)*global_cor_use + (alpha)*local_cor);
            return(combined_chol_cor);
  }
          real corr(int num, int[] vec1, int[] vec2) {
            real pearson_corr =      (num*sum(to_vector(vec1) .* to_vector(vec2)) -  sum(to_vector(vec1))*sum(to_vector(vec2)) )  /  
                              sqrt( (num*sum(to_vector(vec1) .* to_vector(vec1))   -  sum(to_vector(vec1))^2) * (num*sum(to_vector(vec2) .*  to_vector(vec2))   - sum(to_vector(vec2))^2 ) ); 
            return(pearson_corr);
          }
  real phi_logit_approx(real b) {
          real a;
          a = inv_logit( 1.7*b );
            return(a);
          }
  real inv_phi_logit_approx(real b) {
          real a;
          a = (1/1.7)*logit(b);    
            return(a);
          }

 real induced_dirichlet_lpdf(vector c, vector alpha, real phi) {
    int K = num_elements(c) + 1;
    vector[K - 1] sigma = inv_logit(phi - c);
    vector[K] p;
    matrix[K, K] J = rep_matrix(0, K, K);
    
    // Induced ordinal probabilities
    p[1] = 1 - sigma[1];
    for (k in 2:(K - 1))
      p[k] = sigma[k - 1] - sigma[k];
    p[K] = sigma[K - 1];
    
    // Baseline column of Jacobian
    for (k in 1:K) J[k, 1] = 1;
    
    // Diagonal entries of Jacobian
    for (k in 2:K) {
      real rho = sigma[k - 1] * (1 - sigma[k - 1]);
      J[k, k] = - rho;
      J[k - 1, k] = rho;
    }
return   dirichlet_lpdf(p | alpha)
           + log_determinant(J);
  }
}

data {
          int num_binary_tests;
          int<lower=1> nt; // number of tests
          int<lower=1> NS;	// total # of studies
          int<lower=1> ns[NS]; // number of individuals in each study 
          int<lower=0> y[max(ns), nt, NS]; // N individuals and nt tests, NS studies 
          int num_patterns;
          int<lower=0> pa[max(ns), NS]; 
          int numg;
          int num_thresh_params[NS];
          int max_cutpoint;
          int cutpoints[NS, max_cutpoint];
          int twoway_thr[NS];
          int<lower=0, upper=1> primary_care_ind[NS]; //indicator for whether the study comes from primary care or not
          int r[NS, 4, choose(nt,2)];
          int num_refs; // number of reference tests
          int ref[NS]; // ref. test used in each of the NS studies
          int re[NS];
          int total_n;
          int ind[NS];
          int ns_cumsum[NS];
}

transformed data {
              int r2[choose(nt,2), NS, 4];
  
       for (a in 1:4)
          for (s in 1:NS)
            r2[1,s,a] = r[s,a,1];
}

parameters {
              ordered[2] a1_m_raw[num_refs]; // Se > Fp for each reference test
              ordered[2] a2_m_raw;   
              ordered[max_cutpoint] C_dm;
              ordered[max_cutpoint] C_ndm;
              vector<lower=0>[2] sd1;  
              real<lower=0,upper=1> u_d[num_patterns,nt]; // nuisance that absorbs inequality constraints
              real<lower=0,upper=1> u_nd[num_patterns,nt]; // nuisance that absorbs inequality constraints
              cholesky_factor_corr[2] L_Omega_bs1;
              vector[2] z2[NS];
              real<lower=0, upper=1> p[NS]; 
              cholesky_factor_corr[nt] L_Omega_d[NS];
              cholesky_factor_corr[nt] L_Omega_nd[NS];
              vector[2] b_primary_care; // coefficients for primary care (1 for MMSE Se and 1 for MMSE Sp)    
}

transformed parameters {
              vector[2] mu[num_refs + 1];  
              matrix[2, nt] nu[NS]; 
              matrix[NS, num_patterns] log_lik_i; 
              vector[max_cutpoint] C_d[NS];
              vector[max_cutpoint] C_nd[NS];
              vector[num_patterns] lp1;
              vector[num_patterns] lp0;
              vector[nt] z_d[num_patterns];
              vector[nt] z_nd[num_patterns];
              vector[nt] y1d[num_patterns];
              vector[nt] y1nd[num_patterns];

                          // index test (MMSE)
                          mu[num_refs + 1, 1] =  a2_m_raw[1];
                          mu[num_refs + 1, 2] =  a2_m_raw[2];
                          
              for (s in 1:NS) {
                          // ref test (Se > Fp)
                          mu[ref[s], 1] =  a1_m_raw[ref[s], 2];   
                          mu[ref[s], 2] =  a1_m_raw[ref[s], 1];

                 // ref. tests
                   nu[s, 1:2 , 1] =  mu[ref[s], 1:2] ; 

                      // index test random effect - w/ covariate for primary care
                   nu[s, 1:2 , 2] =  mu[num_refs + 1 , 1:2 ]   + primary_care_ind[s]*b_primary_care[1:2] 
                                     + diag_pre_multiply(sd1*y1, L_Omega_bs1) * z2[s,1:2] ; 

                for (i in 1:max_cutpoint) { 
                   C_d[s,i] = C_dm[i];
                   C_nd[s,i] = C_ndm[i] ;
                }
              }

 // Parameters for likelihood function. Based on code upoaded by Ben Goodrich which uses the 
                      // GHK algorithm for generating TruncMVN. See:
                      // https://github.com/stan-dev/example-models/blob/master/misc/multivariate-probit/probit-multi-good.stan#L11
                      {
                 for (s in 1:NS) {
                    for (n in 1:ns[s]) {
                      for (i in 1:nt) { // loop over each test
                           real u_d1 = u_d[pa[n,s],i];
                           real u_nd1 = u_nd[pa[n,s],i];
                           real prev_d = 0;
                           real prev_nd = 0;
                           real bound_d_bin     = phi_logit_approx(    - (nu[s,1,i] + prev_d)   / L_Omega_d[s,i,i] ); // se
                           real bound_nd_bin    = phi_logit_approx(    - (nu[s,2,i] + prev_nd ) / L_Omega_nd[s,i,i] );  // fp
                           vector[ num_thresh_params[s] + 1 ] bound_d;
                           vector[ num_thresh_params[s] + 1 ] bound_nd;
                           bound_d[ num_thresh_params[s] + 1 ] = 1;
                           bound_nd[ num_thresh_params[s] + 1 ] = 1;

                           for (j in 1:(num_thresh_params[s])) {
                             bound_d[j] =  phi_logit_approx(  -( nu[s,1,i] -  C_d[s,cutpoints[s,j]] + prev_d ) / L_Omega_d[s,i,i] );
                             bound_nd[j] = phi_logit_approx(  -( nu[s,2,i] -  C_nd[s,cutpoints[s,j]] + prev_nd) /L_Omega_nd[s,i,i] );
                           }

                           for (j in 2:(num_thresh_params[s] + 1 )) {
                                  if (y[n,i,s] == j) {
                                    z_d[pa[n,s],i]   = inv_phi_logit_approx(bound_d[j-1]  + (bound_d[j] - bound_d[j-1])* u_d1);      
                                    z_nd[pa[n,s],i]  = inv_phi_logit_approx(bound_nd[j-1] + (bound_nd[j] - bound_nd[j-1])* u_nd1);    
                                    y1d[pa[n,s],i]   = log(bound_d[j] - bound_d[j-1]);  // Jacobian adjustment
                                    y1nd[pa[n,s],i]  = log(bound_nd[j] - bound_nd[j-1]); // Jacobian adjustment
                                  }
                                }
                                if (y[n,i,s] == 1 && i > num_binary_tests ) { 
                                  z_d[pa[n,s],i]   = inv_phi_logit_approx(bound_d[1]* u_d1);
                                  z_nd[pa[n,s],i]  = inv_phi_logit_approx(bound_nd[1]* u_nd1);
                                  y1d[pa[n,s],i]   = log(bound_d[1]); // Jacobian adjustment
                                  y1nd[pa[n,s],i]  = log(bound_nd[1]); // Jacobian adjustment
                                }
                                if (y[n,i,s] == 1 && i <= num_binary_tests ) {
                                  z_d[pa[n,s],i]   = inv_phi_logit_approx(bound_d_bin + (1 - bound_d_bin)*u_d1);      
                                  z_nd[pa[n,s],i]  = inv_phi_logit_approx(bound_nd_bin + (1 - bound_nd_bin)*u_nd1);    
                                  y1d[pa[n,s],i]   = log1m(bound_d_bin);  // Jacobian adjustment
                                  y1nd[pa[n,s],i]  = log1m(bound_nd_bin); // Jacobian adjustment
                                }
                               if (y[n,i,s] == 0) { 
                                  z_d[pa[n,s],i]   = inv_phi_logit_approx(bound_d_bin*u_d1);
                                  z_nd[pa[n,s],i]  = inv_phi_logit_approx(bound_nd_bin*u_nd1);
                                  y1d[pa[n,s],i]   = log(bound_d_bin);  // Jacobian adjustment
                                  y1nd[pa[n,s],i]  = log(bound_nd_bin); // Jacobian adjustment
                                }
                              if (i < nt) prev_d  = L_Omega_d[s,i +1,1:i ] * head(z_d[pa[n,s],] ,i);
                              if (i < nt) prev_nd = L_Omega_nd[s,i +1,1:i ] * head(z_nd[pa[n,s],] ,i);
                         }
                            lp1[pa[n,s]]  = sum(y1d[pa[n,s],])  +   bernoulli_lpmf(1 | p[s]); 
                            lp0[pa[n,s]]  = sum(y1nd[pa[n,s],]) +   bernoulli_lpmf(0 | p[s]);
                            log_lik_i[s,pa[n,s]] =  log_sum_exp(lp1[pa[n,s]] , lp0[pa[n,s]] ); 
                        }
                        }
                       }
                     }
}

model {

      p ~ beta(a1,b1); 

      b_primary_care ~ std_normal();

      C_dm  ~ induced_dirichlet(rep_vector(1, max_cutpoint + 1), 0);
      C_ndm ~ induced_dirichlet(rep_vector(1, max_cutpoint + 1), 0);

            sd1  ~   std_normal();
            L_Omega_bs1 ~ lkj_corr_cholesky(2);


       for (j in 1:2) 
           a1_m_raw[,j]  ~ std_normal(); 

           a2_m_raw  ~ std_normal(); 


         for (s in 1:NS) {
          L_Omega_d[s,] ~ lkj_corr_cholesky(x1);
          L_Omega_nd[s,] ~ lkj_corr_cholesky(x2);
         }

        for (s in 1:NS) 
            z2[s, ] ~ std_normal();  
          
           for (s in 1:NS) { // likelihood
            for (n in 1:ns[s]) 
              target +=    log_lik_i[s,pa[n,s]] 
          } 
}

I was wondering whether anybody would have any ideas as to why would the model with no correlation and shared nuisance parameters (model #2) works, but not the same model with correlation (model # 3)?

martinmodrak · November 19, 2020, 4:26pm

Hi, your question is relevant, but it is honestly very hard to answer: the models you posted are huge and I am unable to understand what’s going on in them without significant time investment. I think you are more likely to get an answer if you find a way to strip the models down to the smallest case that shows the issue (often this process in preparing a question helps me to find the answer myself :-)

Best of luck with the model!

CerulloE · November 19, 2020, 4:51pm

Thanks, I’ll work on a simpler version and update

CerulloE · December 22, 2020, 2:45pm

Hi

I will discuss this using bgoordi parameterisation code here so it will be easier to understand.

Code:

data {
  int<lower=1> K;
  int<lower=1> D;
  int<lower=0> N;
  int<lower=0,upper=1> y[N,D];
  vector[K] x[N];
}
parameters {
  matrix[D,K] beta;
  cholesky_factor_corr[D] L_Omega;
  real<lower=0,upper=1> u[N,D]; // nuisance that absorbs inequality constraints
}
model {
  L_Omega ~ lkj_corr_cholesky(4);
  to_vector(beta) ~ normal(0, 5);
  // implicit: u is iid standard uniform a priori
  { // likelihood
    for (n in 1:N) {
      vector[D] mu;
      vector[D] z;
      real prev;
      mu = beta * x[n];
      prev = 0;
      for (d in 1:D) { // Phi and inv_Phi may overflow and / or be numerically inaccurate
        real bound; // threshold at which utility = 0
        bound = Phi( -(mu[d] + prev) / L_Omega[d,d]  );
        if (y[n,d] == 1) {
          real t;
          t = bound + (1 - bound) * u[n,d];
          z[d] = inv_Phi(t);       // implies utility is positive
          target += log1m(bound);  // Jacobian adjustment
        }
        else {
          real t;
          t = bound * u[n,d];
          z[d] = inv_Phi(t);     // implies utility is negative
          target += log(bound);  // Jacobian adjustment
        }
        if (d < D) prev = L_Omega[d+1,1:d] * head(z, d);
        // Jacobian adjustments imply z is truncated standard normal
        // thus utility --- mu + L_Omega * z --- is truncated multivariate normal
      }
    }
  }
}

If there is no observation-level covariates we can run a vector, which i’ll call “pattern”, to denote the response patterns of each observation. Observations with the same pattern contribute the same to the likelihood so we can use this to increase computational efficiency, and we don’t need to read in as many nuisance parameters to CSV - this is important because the large CSV’s generated with larger samples basically makes these models impossible to use for large N as it takes days just to load in the CSV once the model is done running, and the u’s cannot be declared as local parameters since they have constraints ( 0 < u < 1 ).

Model with shared nuisance parameters across observations with the same response patterns would look like this:

data {
  int<lower=1> D;
  int<lower=0> N;
  int<lower=0,upper=1> y[N,D];
 int num_patterns; // total number of response patterns
 vector[N] pattern ; // vector which gives the response pattern for each individual
}
parameters {
  vector[D] beta;
  cholesky_factor_corr[D] L_Omega;
  real<lower=0,upper=1> u[num_patterns,D]; // nuisance that absorbs inequality constraints
}
model {
  L_Omega ~ lkj_corr_cholesky(4);
  tbeta  ~ normal(0, 5);
  // implicit: u is iid standard uniform a priori
  { // likelihood
    for (n in 1:N) {
      vector[D] mu;
      vector[D] z;
      real prev;
      prev = 0;
      for (d in 1:D) { // Phi and inv_Phi may overflow and / or be numerically inaccurate
        real bound; // threshold at which utility = 0
        bound = Phi( -(beta[d] + prev) / L_Omega[d,d]  );
        if (y[n,d] == 1) {
          real t;
          t = bound + (1 - bound) * u[pattern[n],d];
          z[d] = inv_Phi(t);       // implies utility is positive
          target += log1m(bound);  // Jacobian adjustment
        }
        else {
          real t;
          t = bound *  u[pattern[n],d];
          z[d] = inv_Phi(t);     // implies utility is negative
          target += log(bound);  // Jacobian adjustment
        }
        if (d < D) prev = L_Omega[d+1,1:d] * head(z, d);
        // Jacobian adjustments imply z is truncated standard normal
        // thus utility --- mu + L_Omega * z --- is truncated multivariate normal
      }
    }
  }
}

The problem is that when I do this (i.e. share the nuisance parameters), it works fine for univariate probit, i.e. when i assume no correlation (so L_Omega[d,d] = 1 in the code above) and I get the same answers and is much more efficient. However, when I do the same for multivariate probit (i.e. estimate L_Omega[d,d]'s in the above), it does not work. I was wondering if anybody had any ideas why - maybe this just doesn’t work for the multivariate case.

Topic		Replies	Views
Multivariate probit issue Modeling fitting-issues	3	1206	August 15, 2017
Multivariate Probit Mixture model Modeling	3	847	September 25, 2020
Feedback request: Multivariate Probit Regression with GP Modeling rstan , fitting-issues , specification , loo	14	1154	April 9, 2021
Stan_mvmer with shared parameters for sub-models? rstanarm	6	1388	August 9, 2021
Many issues about a Multivarite Probit purchase model: data simulation, reparameterization and speedup Modeling fitting-issues , performance	9	1214	February 10, 2018

Multivariate, hierarchical ordered probit mixture model - sharing nuisance parameters

Related topics