~ multi_normal()T[lower,upper] impossible in stan

There are (econometric) models that call for MVN heterogeneity but truncation on at least 1 dimension is necessary. For instance, estimated income must not be lower than observed expenditures.

Having read through most threads on MVN, truncated MVN, truncated multivariate distributions, … on stan’s discourse, it seems that this will remain impossible to implement in stan.

Old-school approaches would either (1) use rejection sampling or (2) sequentially cycle through the dimension of the MVN by computing conditional mean,sd and then sample from univariate truncated Normals. Neither of these approaches is possible in stan where parameters are drawn simultaneously.

Please correct me if I am wrong, but I see no way truncations on MVN will ever be possible in stan.

Welcome to discourse! You might find this post useful:

1 Like

Does make_stuff work in transformed parameters the same way it does in generated quantities? I’ll try.
EDIT: Yes, it does work the same way. Nice.
I created a minimal example based on that post. A simple MVN with 3 dimensions and a constraint on the last. Ocular inspection of marginal histogram looks good. Now I am wondering if make_stuff is actually run twice in run time, once to get the Jacobian and once to do the actual transformation?

functions {
  vector[] make_stuff(vector mu, matrix L, vector b, vector s, vector u) {
    int K = rows(mu); vector[K] d; vector[K] z; vector[K] out[2];
    for (k in 1:K) {
      int km1 = k - 1;
      if (s[k] != 0) {
        real z_star = (b[k] - 
                      (mu[k] + ((k > 1) ? L[k,1:km1] * head(z, km1) : 0))) / 
                      L[k,k];
        real v; real u_star = Phi(z_star);
        if (s[k] == -1) {
          v = u_star * u[k];
          d[k] = u_star;
        }
        else {
          d[k] = 1 - u_star;
          v = u_star + d[k] * u[k];
        }
        z[k] = inv_Phi(v);
      }
      else {
        z[k] = inv_Phi(u[k]);
        d[k] = 1;
      }
    }
    out[1] = z;
    out[2] = d;
    return out;
  }
}

data{
  vector[3] Mu;
  cov_matrix[3] Sigma;
  vector[3] b; 
  vector<lower=-1,upper=1>[3] s;
}

transformed data{
  cholesky_factor_cov[3,3] L=cholesky_decompose(Sigma);
}

parameters {
    matrix<lower=0,upper=1>[3,100] u;
}

transformed parameters {
matrix[3,100] Beta;
{
  for (r in 1:100) {
    Beta[,r]= Mu + L * make_stuff(Mu, L, b, s, u[,r])[1];
  }
}
}

model {
//  to_vector(u) ~ uniform(0,1);
  for (r in 1:100) {
    target += log(make_stuff(Mu, L, b, s, u[,r])[2]);
  }
}
fit = sampling(stantest, 
               list(Mu=c(0,0,1),Sigma=cov2cor(diag(3)+1),
                    b=c(0,0,1),s=c(0,0,1)), chains=1, iter=300)

rstan::extract(fit)$Beta[99,3,] %>% hist

OK, follow-up question here …

How do I parallelize this?

map_rect evaluates to a vector.
Therefore, I need to split Jacobian and draw in make_stuff. This way each run of make_stuff_draw and make_stuff_Jacobian will return a vector. These would be put together into a single vector, and I have to convert it into a matrix afterwards.

Seems doable, but is it?

You only need to call it once. No need to actually construct Beta as the only thing that matters for the sampler is the log_prob calculation. Though I haven’t actually tested this, so please report back if you find issues. If I wanted Beta I’d construct it in the generated quantities block, which should be faster as these quantities don’t get autodiffed through.

If you need both upper and lower bound on the same “draw”, it’s in my helpful functions repo.

Reproduced here. The difference is that you supply both a lb and ub vector as well as two indicator vectors. Indicating whether the variate is 1 if bounded or 0 unbounded. If a variate is unbounded the the lb/ub vectors aren’t referenced.

/**
   * Truncated Multivariate Log Probability Density
   *
   * @copyright Ben Goodrich, Ethan Alt, Sean Pinkney 2017, 2021, 2021 \n
   * https://groups.google.com/g/stan-users/c/GuWUJogum1o/m/LvxjlUBnBwAJ \n
   * Accessed and modified Apr. 18, 2021 
   *
   * @param u Vector number on [0,1], not checked but function will return NaN
   * @param mu Vector
   * @param L Cholesky Factor Corr
   * @param lb Vector of lower bounds
   * @param ub Vector of upper bounds
   * @param lb_ind Vector indicator if there is an lower bound
   * @param ub_ind Vector indicator if there is an upper bound
   * @return log density
   */

  real multi_normal_cholesky_truncated_lpdf(vector u, 
                                            vector mu, 
                                            matrix L, 
                                            vector lb, 
                                            vector ub, 
                                            vector lb_ind, 
                                            vector ub_ind) {
      int K = rows(u); 
      vector[K] z;
      real lp = 0;
      
      for ( k in 1:K ) {
        // if kth u is unbounded
        // else kth u has at least one bound
        if ( lb_ind[k] == 0 && ub_ind[k] == 0 )  z[k] = inv_Phi(u[k]);
          else { 
            int km1 = k - 1;
            real v; 
            real z_star;
            real logd;
            row_vector [2] log_ustar = [negative_infinity(), 0];           // [-Inf, 0] = log([0,1])
            real constrain = mu[k] + ((k > 1) ? L[k, 1:km1] * head(z, km1) : 0);
            
            // obtain log of upper and lower bound (if applicable)
            if ( lb_ind[k] == 1 ) log_ustar[1] = normal_lcdf( ( lb[k] - constrain ) / L[k, k] | 0.0, 1.0 );
            if ( ub_ind[k] == 1 ) log_ustar[2] = normal_lcdf( ( ub[k] - constrain ) / L[k, k] | 0.0, 1.0 );
          
            // update log gradient and z  
            logd  = log_diff_exp(log_ustar[2], log_ustar[1]);
            v     = exp( log_sum_exp( log_ustar[1], log(u[k]) + logd ) );    // v = ustar[1] + (ustar[2] - ustar[1]) * u[k] ~ U(ustar[1], ustar[2])
            z[k]  = inv_Phi(v);                                              // z ~ TN
            lp   += logd;                                                    // increment by log gradient
          }
        }
      return lp;
     }
2 Likes

Those are some really nice implementations of truncated MVN functions.

However, I am still confused about what transformation/generation to do where. I do need Beta not just as a generated quantity, I need it as a parameter. I should have clarified that. Let me provide some mock code to illustrate the setup:

functions{
  real myLLfun(... matrix Beta ...) //lower level log-likelihood function for reduce_sum
  
  real multi_normal_cholesky_truncated_lpdf() 
  vector multi_normal_cholesky_truncated_rng()
}

data{
  X,  mu_0,  mu_sig, //data, prior specs
  S, // number of subjects
  Q, // number of parameters
  lb[Q,S],ub[Q,S],lb_ind[Q,S],ub_ind[Q,S]
}

parameters {
  vector[Q] mu; // upper MVN: mean vector of ind betas
  vector<lower=0>[Q] tau;
  cholesky_factor_corr[Q] L_Omega;
  matrix<lower=0,upper=1>[Q, S] Z; // individual z, unif(0,1)
}

transformed parameters {
  //chol-of-covmat
  matrix[Q,Q] L = diag_pre_multiply(tau, L_Omega); 
  
  //lower level para
  matrix[Q, S] Beta;
  {
  for (r in 1:S) {
    Beta[,r] = multi_normal_cholesky_truncated_rng(Z[,r],  mu, L,   lb[,r],ub[,r],lb_ind[,r],ub_ind[,r]);
  }
  }
}

model {
  //2nd stage
  tau ~ cauchy(0, 2.5); 
  L_Omega ~ lkj_corr_cholesky(2);
  mu ~ normal(mu_0, mu_sig); 

  //1st stage
  to_vector(Z) ~ uniform(); // implicit

    for (r in 1:S) {
      target+= multi_normal_cholesky_truncated_lpdf(Z[,r], mu, L, lb[,r],ub[,r],lb_ind[,r],ub_ind[,r])
    }
  
  //logll
  target += reduce_sum(myLLfun, ..., Beta, X, ...);
  
} 

The _rng function is really just a transformation going from unconstrained (uniform-distributed) space to constrained space (the MVN-trun distributed space). The _lpdf function is what I need for the logpost. Somehow I think I am missing something here. Or, rather, I am double counting something: I have Z and transformed Z in here, but I think the only thing to add in model{} would be the Jacobian.

_lp functions aren’t used much but this is exactly when they’re useful. See the SUG section 18.8. You can access the log prob accumulator with this function type and return Beta.

I haven’t tested below but hopefully it will work without much tinkering.

/**
   * Truncated Multivariate Normal 
   *
   * @copyright Ben Goodrich, Ethan Alt, Sean Pinkney 2017, 2021, 2021 \n
   * https://groups.google.com/g/stan-users/c/GuWUJogum1o/m/LvxjlUBnBwAJ \n
   * Accessed Apr. 18, 2021
   * Modified June 23, 2021  
   *
   * @param u Vector number on [0,1], not checked but function will return NaN
   * @param mu Vector
   * @param L Cholesky Factor Corr
   * @param lb Vector of lower bounds
   * @param ub Vector of upper bounds
   * @param lb_ind Vector indicator if there is an lower bound
   * @param ub_ind Vector indicator if there is an upper bound
   * @return log density
   */

  vector multi_normal_cholesky_truncated_lp(vector u, 
                                            vector mu, 
                                            matrix L, 
                                            vector lb, 
                                            vector ub, 
                                            vector lb_ind, 
                                            vector ub_ind) {
      int K = rows(u); 
      vector[K] z;
      real lp = 0;
      
      for ( k in 1:K ) {
        // if kth u is unbounded
        // else kth u has at least one bound
        if ( lb_ind[k] == 0 && ub_ind[k] == 0 )  z[k] = inv_Phi(u[k]);
          else { 
            int km1 = k - 1;
            real v; 
            real z_star;
            real logd;
            row_vector [2] log_ustar = [negative_infinity(), 0];           // [-Inf, 0] = log([0,1])
            real constrain = mu[k] + ((k > 1) ? L[k, 1:km1] * head(z, km1) : 0);
            
            // obtain log of upper and lower bound (if applicable)
            if ( lb_ind[k] == 1 ) log_ustar[1] = normal_lcdf( ( lb[k] - constrain ) / L[k, k] | 0.0, 1.0 );
            if ( ub_ind[k] == 1 ) log_ustar[2] = normal_lcdf( ( ub[k] - constrain ) / L[k, k] | 0.0, 1.0 );
          
            // update log gradient and z  
            logd    = log_diff_exp(log_ustar[2], log_ustar[1]);
            v       = exp( log_sum_exp( log_ustar[1], log(u[k]) + logd ) );    // v = ustar[1] + (ustar[2] - ustar[1]) * u[k] ~ U(ustar[1], ustar[2])
            z[k]    = inv_Phi(v);                                              // z ~ TN
            target += logd;                                                    // increment by log gradient
          }
        }
      return mu + L * z;
     }
1 Like

Ah, ok. _lp functions return the draw, update target and can be used in transformed parameters. Nice.

In my mock example, I don’t need _lpdf and _rng, just _lp in the transformed parameters section. This looks much more familiar.

Will test this, then use map_rect to speed things up. multi_normal_cholesky_truncated_lp isn’t super fast, and testing takes a while.

EDIT: Seems to work, more testing necessary, but it’s slow.

Can someone confirm that map_rect works within transformed parameters (it seems like it per Can map_rect return more than log probability?)? And just realizing how much pain map_rect is vs reduce_sum. map_rect is quite inflexible when it comes to passing on data and parameters. If only I could simply add a #pragma omp parallel ahead of the for (r in 1:S) {...}