Setting a prior with discontinuity at 0

Hi Stan folks,

This might be less of a Stan question (although maybe there’s a Stan solution) and more of a model-fitting question, but here goes.

In part of my model, I have

\pi_j = 1 - \exp{\left(-\dfrac{r_j^2}{\tau^2}\right)}

Where I want \tau \sim MVN(\mu, \Sigma).

Of course, there is the problem where \tau cannot be 0; in fact it is known that \tau is strictly positive. However, there does not exist a truncated multinormal distribution in Stan. When I go to run the model, Stan gives me the "Initialize between (-2,2) failed after 100 attempts.

I’m wondering how I might go about fixing this. In the univariate version, I set \tau \sim lognormal(1,1), which resolves the issue of \tau being sample as 0. Is there a way to force Stan to try initializing between, say, (1,3)?

The following is my model, for reference:

data {
  int<lower = 1> n_samples;           // total number of sampling events i
  int<lower = 1> n_species;           // total number of specise
  int<lower = 2> max_intervals;       // maximum number of intervals being considered
  int species[n_samples];             // species being considered for each sample
  int abund_per_band[n_samples, max_intervals];// abundance in distance band k for sample i
  vector[n_samples] abund_per_sample; // total abundnace for sample i
  int bands_per_sample[n_samples]; // number of distance bands for sample i
  matrix[n_samples, max_intervals] max_dist; // max distance for distance band k
  corr_matrix[n_species] phylo_corr; // correlation matrix of phylogeny
}

parameters {
  row_vector<lower = 0>[n_species] mu;
  vector<lower = 0>[n_species] sigma;
  vector[n_species] log_tau;
}

model {
  matrix[n_samples, max_intervals] Pi;   // probabilities
  
  sigma ~ cauchy(0, 2.5);
  mu ~ lognormal(1, 1);
  
  log_tau ~ multi_normal(mu, quad_form_diag(phylo_corr, sigma));
  
  Pi = rep_matrix(0, n_samples, max_intervals);
  
  for (i in 1:n_samples)
  {
    for (k in 2:bands_per_sample[i])
    {
      Pi[i,k] = ((1 - exp(-(max_dist[i,k]^2 / exp(log_tau[species[i]])^2))) - 
      (1 - exp(-(max_dist[i,k - 1]^2 / exp(log_tau[species[i]])^2)))) / 
      (1 - exp(-(max_dist[i,bands_per_sample[i]]^2 / exp(log_tau[species[i]])^2)));
    }
    Pi[i,1] = 1 - sum(Pi[i,]);
    
    abund_per_band[i,] ~ multinomial(to_vector(Pi[i,]));
  }
}

Thanks for any assistance!

Hi Brandon,

Your post suggests that you want tau to be distributed as a truncated multivariate normal. But the Stan code in your post has the logarithm of tau distributed as a multivariate normal. I don’t see why a multivariate normal log(tau) should pose an initialization problem, and I wonder if your initialization issues are arising somewhere else?

On the other hand, it’s reasonable to want a truncated multivariate normal on tau rather than a multivariate normal on log(tau) (the former can be chosen to regularize tau towards zero; the latter will always regularize tau towards some positive quantity). To achieve the former without letting tau equal zero, you can just declare tau with a lower bound of zero and then put the multivariate normal prior on tau, yielding a truncated multivariate normal.

Finally, note that you can always change your initialization. The details depend on the interface, but generally you’ll find an argument named init or similar. However, for a strictly positive quantity it probably won’t work to provide positive inits without also declaring the lower bound of zero. Otherwise, the sampler might try to cross zero and then will run into all sorts of trouble.

1 Like

Thank you for your reply. I have now gone ahead and set vector<lower = 0>[n_species] log_tau; in the model, but still get the following error on all chains:

Chain 1: Rejecting initial value:
Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
Chain 1:   Stan can't start sampling from this initial value.

Out of curiosity, I then set vector<lower = 3>[n_species] log_tau;, just to see what would happen, and I get a mixture of the above error and this error:

Chain 3: Rejecting initial value:
Chain 3:   Gradient evaluated at the initial value is not finite.
Chain 3:   Stan can't start sampling from this initial value.

Any thoughts?

The below is incorrect, see @jsocolar’s comment below.

Stan’s default initialization attempts to initialize parameters in the interval [-2, 2] – by setting the lower bound on log_tau to be 3 and not changing the initialization points, Stan cannot initialize. It tries multiple values in [-2, 2], getting infinities everywhere because of the lower bound and eventually errors out.

I’d recommend following @jsocolar’s suggestion at trying to change the init values to something more compatible.

Thanks for the reply.

I have tried setting the initial values like this

...
                     init = list(list("log_tau[1]" = 3,
                                      "log_tau[2]" = 3,
                                      "log_tau[3]" = 3,
                                      "log_tau[4]" = 3,
                                      "log_tau[5]" = 3,
                                      "log_tau[6]" = 3,
                                      "log_tau[7]" = 3,
                                      "log_tau[8]" = 3,
                                      "log_tau[9]" = 3,
                                      "log_tau[10]" = 3,
                                      "log_tau[11]" = 3,
                                      "mu[1]" = 3,
                                      "mu[2]" = 3,
                                      "mu[3]" = 3,
                                      "mu[4]" = 3,
                                      "mu[5]" = 3,
                                      "mu[6]" = 3,
                                      "mu[7]" = 3,
                                      "mu[8]" = 3,
                                      "mu[9]" = 3,
                                      "mu[10]" = 3,
                                      "mu[11]" = 3,
                                      "sigma[1]" = 1,
                                      "sigma[2]" = 1,
                                      "sigma[3]" = 1,
                                      "sigma[4]" = 1,
                                      "sigma[5]" = 1,
                                      "sigma[6]" = 1,
                                      "sigma[7]" = 1,
                                      "sigma[8]" = 1,
                                      "sigma[9]" = 1,
                                      "sigma[10]" = 1,
                                      "sigma[11]" = 1),...

So this is the init argument in the sampling() function. I’ve repeated the internal list 4 times with the same inits (because I have 4 chains), but have only put one chain here for brevity. I still get both errors as above.

This isn’t correct. Stan initializes between -2 and 2 on the unconstrained scale, which gets backtransformed to the constrained scale via the inverse unconstraining function.

1 Like

My guess is that there’s a numerical problem somewhere in

Pi[i,k] = ((1 - exp(-(max_dist[i,k]^2 / exp(log_tau[species[i]])^2))) - 
      (1 - exp(-(max_dist[i,k - 1]^2 / exp(log_tau[species[i]])^2)))) / 
      (1 - exp(-(max_dist[i,bands_per_sample[i]]^2 / exp(log_tau[species[i]])^2)));

but I’m not sure. To debug, I would either start simplifying the structure a bit, or I would compute and print intermediate quantities from

Pi[i,k] = ((1 - exp(-(max_dist[i,k]^2 / exp(log_tau[species[i]])^2))) - 
      (1 - exp(-(max_dist[i,k - 1]^2 / exp(log_tau[species[i]])^2)))) / 
      (1 - exp(-(max_dist[i,bands_per_sample[i]]^2 / exp(log_tau[species[i]])^2)));

to see which piece, if any, is blowing up and returning nonsense.

Do you really want to enforce that the logarithm of tau be greater than zero?

Afraid not. You can only get uniform(-a, a) for a > 0. The thing to do is change the declaration to give the positive constrained variable a lower=0 constraint.

I’m a bit confused by the question because your math says \tau \sim \textrm{MVN}(...) but the Stan code says log_tau ~ multi_normal(...). I’m having trouble seeing what can be zero that shouldn’t be.

Also, if \tau is a vector, what is \tau^2 supposed to be? Is there a missing subscript?

+1 to debug by printf.

To be more specific, if you have this:

real<lower=3> a;

then the transform from unconstrained a_unc is a = exp(a_unc) + 3. So if a_unc ~ uniform(-2, 2), then a is initialized in (exp(-2) + 3, exp(2) + 3), which means values will be in (3, 10), but not uniformly distributed after the transform.

1 Like

Not that I’d recommend it as a next step for @BrandonEdwards’ troubleshooting… but one could do this with offset = 2 right?