Rejecting parameter combination

I am wondering how best to implement the rejection of a certain combination of parameters when performing HMC sampling.

For context, I have a combination of 5 material parameters which are then numerically solved to return an expected value of frequency given a certain wavenumber. Certain combinations of these parameters are physically impossible and I have successfully implemented a check for this within the solution. However, I was wondering what is the best practice for rejecting this combination of parameters.

My algorithm uses the reduce_sum to update the log-probability over a set of predicted and measured values, assuming a normal distribution of the predicted values. My ideas for rejection were:

  • Set target += log(0) to force rejection through the acceptance ratio
  • Set target += -1e20 to have a similar effect as above
  • Set the predicted values all equal to zero (the measured values are in the magnitude of ~500-1500) to cause a small likelihood
  • Use the reject statement to force moving onto the next sample iteration

Any advice on best practice would be much appreciated, as I understand each approach might have a different effect on the calculated gradient for the HMC procedure.

This would seem the most straightforward, least easy to go awry and easiest for others reading your code.

You might want to think about whether one of the related/constrained variable types (simplex, etc) might suit your physical constraints on the combinations. Maybe post your code with a description of the combos that you’d like to reject?

It is worth noting the Stan documentation states that “Rejection is not for Constraints”, so it may be worth exploring other options if possible.

Depending on the nature of the physical constraints, probably one of the more desirable options would be to exclude the forbidden regions by setting the probability of that region to 0 in the prior specification. This is likely non-trivial, but may be worth exploring depending on the nature of your problem.

1 Like

If the forbidden combinations don’t occupy a huge part of the otherwise high-density region of parameter space, it would be best to just fit the model without rejection and then reject the offending iterations after the fact (as in, subset the posterior iterations to just those that are allowable according to your constraints).

Your ideas result in discontinuities or near-discontinuities in the likelihood which will cause problems for the symplectic integrator that are likely to manifest as divergences. After the fact, it will be difficult to know whether all of your problems are due to this issue, or whether some of them reflect obstructions to geometric ergodicity within the allowable region of parameter space.

Moreover, it is quite possible that the ability to transit these physically impossible regions of parameter space will better enable the model to transit between, and properly explore, the allowable regions of parameter space.

The main reason not to do this would be if the model, given the opportunity, simply spends most or all of its time in the physically impossible region, forcing you to run HMC for a prohibitive number of iterations in order to get samples from the allowable regions.

2 Likes

Thanks so much for the replies everyone, all your inputs are helpful. For better context here is (most of) my code.

functions {
   real cp_eigs(matrix A) {
       real cp = 0.0;
       int nm = rows(A);
       int nn = cols(A);

       vector[nm] v = rep_vector(1,nm);
       matrix[nm,nm] Ai = inverse(A);
       for (i in 1:10) {
           v = Ai*v;
           v = v./max(v);
       }
       
       real mlest;
       mlest = (v'*Ai*v)/(v'*v);
       if (mlest>0.0) {
           cp=0.0;
       } else if (mlest<0.0) {
           cp = sqrt(-1/mlest);
       }
       return cp;
   }

   real partial_sum(real[,] y_set_slice, int start, int end, real[] k_ast, int[] n_set, real[] C, real rho, real h, real sigma) {

        real t_val = 0.0;

        int M = 4;
        int n_modes = 2*(M+1);
        int n_half = M+1;
        real k, kh;
        matrix[n_modes,n_modes] A;
        real C11 = C[1]*1e9;
        real C13 = C[2]*1e9;
        real C31 = C[2]*1e9;
        real C33 = C[3]*1e9;
        real C55 = C[4]*1e9;

        int nk = size(k_ast[start:end]);
        int ntt = sum(n_set[start:end]);

        real cp_val;
        real w_val;

        int count = 0;

        for (n in start:end) {

            count += 1;

            k = k_ast[n];

            kh = k*h;

            real nt1mj0;
            real nt1mj1;
            real nt1mj2;
            real nt2mj0;
            real nt2mj1;

            for (m in 0:M) {
                for (jv in 0:M) {
                    
                    nt1mj0 = nt1(m, jv, 0, kh);
                    nt1mj1 = nt1(m, jv, 1, kh);
                    nt1mj2 = nt1(m, jv, 2, kh);
                    nt2mj0 = nt2(m, jv, 0, kh);
                    nt2mj1 = nt2(m, jv, 1, kh);

                        A[jv+1,m+1] = -(C11/rho) * nt1mj0 + (C55/rho) * nt1mj2 + (C55/rho) * nt2mj1;
                        A[jv+1,m+1+n_half] = -1 * ((C13+C55)/rho * nt1mj1 + C55/rho * nt2mj0);
                        A[jv+1+n_half,m+1] = (C31+C55)/rho * nt1mj1 + C31/rho * nt2mj0;
                        A[jv+1+n_half,m+1+n_half] = -(C55/rho) * nt1mj0 + (C33/rho) * nt1mj2 + (C33/rho) * nt2mj1;
                }
            }

            vector[n_set[n]] wa0;

            cp_val = cp_eigs(A);
            w_val = cp_val*k;
            wa0 = rep_vector(w_val,n_set[n]);
            t_val += normal_lpdf(y_set_slice[count,1:n_set[n]] | wa0, sigma);

        }
        return t_val;

   }

}
data {
  int<lower=0> N;
  int<lower=0> N_ast;
  real<lower=0> k_ast[N_ast];
  real<lower=0> y[N];
  int<lower=0> n_set[N_ast];
}
transformed data{
    // parallel - split values of y into sets
    real<lower=0> y_set[N_ast,max(n_set)] = rep_array(0,N_ast,max(n_set));
    int aa = 1;
    int bb = 0;
    for (i in 1:N_ast) {
        bb += n_set[i];
        for (j in 1:n_set[i]) {
            y_set[i,j] = y[aa+j-1]*k_ast[i];  // split data of cp into subsets of w
        }
        aa += n_set[i];
    }
}
parameters {
  real log_C[4];
  real<lower=0> log_rho;
  real<lower=0> log_sigma;
}
transformed parameters {
   real<lower=0> C[4];
   real<lower=1> rho;
   real<lower=1> sigma;
   for (i in 1:4) { C[i] = exp(log_C[i]); }  //    C = exp(log_C);
   rho = exp(log_rho);
   sigma = exp(log_sigma);
}
model {

    // Set thickness
    real h = 1e-3;

    // MCMC model

    C[1] ~ gamma(3, 0.03);  // C_11
    C[2] ~ gamma(1.5, 0.025); // C_13
    C[3] ~ gamma(1.5, 0.025); // C_33
    C[4] ~ gamma(1.5, 0.025); // C_55

    rho ~ normal(1600.0, 300.0);
    sigma ~ gamma(2, 2e-5);

    int grainsize = 1;
    target += reduce_sum(partial_sum, y_set, grainsize, k_ast, n_set, C, rho, h, sigma);
}

The part where the impossible parameter combination catch is within the cp_eigs function, where any eigenvalues above 0 are impossible.

In terms of the space of parameters that are impossible, over the space of C{1-4} and rho there will be pockets of impossible combinations but it needs to be able to move over the entire space as the different measured values (different propagation angles) will have different material property spaces.

Yes this is something I explored by scowering the equations to see if I could find these regions but I couldn’t come to a reasonable trivial solution.

I think I will try this method, it’s difficult to know how much of the total space is forbidden combinations, if it were a large portion would this affect the outcome of this procedure?

1 Like

What matters isn’t the proportion of parameter space that violates constraints, but rather the proportion of the posterior probability mass (for the model without constraints) that violates constraints. As this proportion increases, more and more of your iterations will return samples from the forbidden region, and so you’ll need to run your chains for longer and longer to recover sufficient samples from the allowable region. Unless you have some deep intuitions about the model and likelihood, probably the best way to figure this out would be to give it a try.

This procedure (as well as the approaches that you outlined above) corresponds to a prior model that begins with your explicit priors and then just crops away any region that violates constraints. It sounds like this is what you intend, so that’s great. Note, however, that depending on the size and shape of the forbidden region, this might correspond to a prior model with margins that look very different from the margins that you express with your prior statements. I would recommend sampling from your prior model, implementing the post-hoc rejection procedure, and then examining the resulting prior distribution to make sure it hasn’t done anything too surprising (like placing most of the prior mass over an overly narrow range for one or more parameter) before making important inferences from the model posterior.

1 Like