Getting the normalizing constant by integration: positive infinity doesn't work

bnicenboim · February 21, 2021, 9:30am

Hi,
I have here a model to fit the approximated sum of dependent variables with a gamma distribution with the same coefficient of variation (cv or tau) and thus a different rate. (I got to this monstrosity with the saddle point approximation and using sympy.)

The lpdf works incredibly well! (it could be faster though). The issue is that I know that it’s not a real pdf which integrates to one, and I won’t be able to compare the fit of the sum of gamma to something else (using loo for example). So I 'm trying to get the normalizing constant through integration.

The issue is that it fails when I set the right limit of integration to positive_infinity(). I get the error:
Chain 1 Exception: Error in function tanh_sinh<double>::integrate: The tanh_sinh quadrature evaluated your function at a singular point and got -nan. Please narrow the bounds of integration or check your function for singularities.

My question: Does it make sense then to set the right limit of the integration to a very large number instead of positive_infinity()? E.g., 2999.9,, I’m expecting the values of the actual x to be not larger than 10. (Larger right limits of integration give me the same error as pos infinity, but sporadically not all the time, which annoyingly enough don’t let me get the fit of the model, even if the integration happens in the generated quantities. Slightly smaller right limits give me the same normalizing constant than 2999.9.)

(I’ve played with the density function in R, and while I can set to x to very large and I end up with a lpdf of 0 (due to underflow) with x >10000, I also cannot set x to infinity (I get a nan). )

functions {
  real gammasum_lpdf(real x, real mu_1, real mu_2, real tau) {
    //approximated pdf
    real x0 = log(2);
    real x1 = tau^(-2);
    real x2 = 2*x1;
    real x3 = mu_2*x;
    real x4 = mu_1*x;
    real x5 = mu_1^2;
    real x6 = mu_2^2;
    real x7 = x5*x6;
    real x8 = x^2;
    real x9 = mu_1*mu_2;
    real x10 = 2*x9;
    real x11 = -x10*x8 + x5*x8 + x6*x8;
    real x12 = x11 + 4*x7;
    real x13 = sqrt(x12);
    real x14 = x10 + x13;
    real x15 = 1/mu_1;
    real x16 = x/2;
    real x17 = 1/mu_2;
    return x0*(x2 + 1/2) - x0 + x1*(log(mu_1) + log(mu_2)) + (x2 - 1)*log(x) - log(tau) +
      //after sympy.simplify
      //log((x14 - x3 + x4)^(-x1)*(x14 + x3 - x4)^(-x1)*exp(x1*(x13*x15*x17/2 - x15*x16 - x16*x17 + 1))/sqrt((x10*x13 + x12)/(x11 + 4*x13*x9 + 8*x7)))
      // my own simplification
      + x1*(x13*x15*x17/2 - x15*x16 - x16*x17 + 1)
      + log((x14 - x3 + x4)^(-x1)*(x14 + x3 - x4)^(-x1))
      - log(x10*x13 + x12)/2
      + log_sum_exp([log(x11), log(4) + log(x13) + log(x9), log(8) + log(x7)])/2
      - log(pi())/2;
  }
  real gammasum_pdf(real x,          // Function argument
                    real xc,         // Complement of function argument
                                     //  on the domain (defined later)
                    real[] theta,    // parameters
                    real[] x_r,      // data (real)
                    int[] x_i) {     // data (integer)
    real mu_1 = theta[1];
    real mu_2 = theta[2];
    real tau = theta[3];
    return exp(gammasum_lpdf(x | mu_1, mu_2, tau)) ;
  }
}
data {
  int<lower = 0> N;
  real<lower = 0> y[N];
}
transformed data {
  int x_i[0];
  real x_r[0];
}
parameters {
  positive_ordered[2] mu;
  real<lower=0> cv;
}
model {
  target += normal_lpdf(mu | 0, 1);
  target += normal_lpdf(cv | 0, 1);
  for(j in 1:N){
    target += gammasum_lpdf(y[j]| mu[1], mu[2], cv);
  }
}
generated quantities{
  vector[N] approx_log_lik;
  real K;
  vector[N] log_lik;
  for(j in 1:N){
    approx_log_lik[j] = gammasum_lpdf(y[j]| mu[1], mu[2], cv);

  }
  // print("mu_1 ", mu[1],
  //    ";mu_2 ", mu[2],
  //    ";tau ", cv);
  K = log(integrate_1d(gammasum_pdf,
                       0,
//                       10000.0,
                       2999.9,
//                       100000000.0,
                       //positive_infinity(),
                       //positive_infinity(),
                       { mu[1], mu[2], cv }, x_r, x_i));
  log_lik  = approx_log_lik + K;
}

Code to try it in R

rgammasum <- function(N, mean1=1, mean2=1, cv=1){
 alpha = 1/cv^2 
 beta1 = 1/(mean1 * cv^2) 
 beta2 = 1/(mean2 * cv^2) 
 rgamma(N, alpha, rate= beta1) + rgamma(N, alpha, rate = beta2)
}

N <- 1000
y <- rgammasum(N, .5, .15, .22)

m_ss <- cmdstan_model("gamma_sum_saddle.stan")
fit_ss <- m_ss$sample(
                 data = list(N=length(y),y=y),
                 seed = 123,
                 parallel_chains = 4,
max_treedepth = 15,
adapt_delta = .9)

Output:

> fit_ss$summary(c("mu","cv","K", "approx_log_lik"))
# A tibble: 1,004 x 10
   variable   mean median      sd     mad      q5    q95  rhat
   <chr>     <dbl>  <dbl>   <dbl>   <dbl>   <dbl>  <dbl> <dbl>
 1 mu[1]     0.136  0.110 8.75e-2 7.95e-2  0.0240  0.306  1.01
 2 mu[2]     0.511  0.536 8.74e-2 7.95e-2  0.340   0.622  1.01
 3 cv        0.210  0.205 2.12e-2 2.35e-2  0.180   0.245  1.02
 4 K        -0.344 -0.344 3.67e-4 3.59e-4 -0.345  -0.344  1.00
 5 approx_…  0.500  0.501 3.44e-2 3.35e-2  0.441   0.557  1.01
 6 approx_…  0.913  0.912 2.22e-2 2.21e-2  0.877   0.949  1.01
 7 approx_…  0.545  0.545 3.28e-2 3.16e-2  0.490   0.599  1.01
 8 approx_…  0.909  0.908 2.22e-2 2.19e-2  0.873   0.945  1.01
 9 approx_…  0.868  0.868 2.31e-2 2.31e-2  0.832   0.906  1.01
10 approx_…  0.462  0.462 2.80e-2 2.70e-2  0.414   0.508  1.01
# … with 994 more rows, and 2 more variables: ess_bulk <dbl>,
#   ess_tail <dbl>

Unrelated to the main issue, maybe @martinmodrak has some idea about this. Isn’t it strange the normalizing constant is so big in comparison with the approximated likelihood? I understood that the saddle point approximation should be quite close to the real pdf including the normalizing constant.

Thanks!

maxbiostat · February 21, 2021, 10:55am

Have you seen this recent thread? You might need to update your set up to get round a bug or two in the quadrature routines. Failing that you can try to use the integration-by-parts trick to stabilise the integrand. I can help with that, but for me it’s hard to translate Stan code into maths, so if you can write down your model in equations, it’d be helpful.

bnicenboim · February 21, 2021, 11:55am

I am using cmdstan 2.26 though. Or is there something else I should update?

I’ll post the equation in math in while then. I do warn you that is huge!

bnicenboim · February 21, 2021, 5:26pm

Hi,
This is density (after exponentiation).

\frac{\left(2 \mu_{1} \mu_{2} - \mu_{1} x + \mu_{2} x + \sqrt{4 \mu_{1}^{2} \mu_{2}^{2} + \mu_{1}^{2} x^{2} - 2 \mu_{1} \mu_{2} x^{2} + \mu_{2}^{2} x^{2}}\right)^{- \frac{1}{\tau^{2}}} \left(2 \mu_{1} \mu_{2} + \mu_{1} x - \mu_{2} x + \sqrt{4 \mu_{1}^{2} \mu_{2}^{2} + \mu_{1}^{2} x^{2} - 2 \mu_{1} \mu_{2} x^{2} + \mu_{2}^{2} x^{2}}\right)^{- \frac{1}{\tau^{2}}} e^{\frac{1 - \frac{x}{2 \mu_{2}} - \frac{x}{2 \mu_{1}} + \frac{\sqrt{4 \mu_{1}^{2} \mu_{2}^{2} + \mu_{1}^{2} x^{2} - 2 \mu_{1} \mu_{2} x^{2} + \mu_{2}^{2} x^{2}}}{2 \mu_{1} \mu_{2}}}{\tau^{2}}} e^{\left(-1 + \frac{2}{\tau^{2}}\right) \log{\left(x \right)} + \left(\frac{1}{2} + \frac{2}{\tau^{2}}\right) \log{\left(2 \right)} + \frac{\log{\left(\mu_{1} \right)} + \log{\left(\mu_{2} \right)}}{\tau^{2}}}}{2 \sqrt{\pi} \tau \sqrt{\frac{4 \mu_{1}^{2} \mu_{2}^{2} + \mu_{1}^{2} x^{2} - 2 \mu_{1} \mu_{2} x^{2} + 2 \mu_{1} \mu_{2} \sqrt{4 \mu_{1}^{2} \mu_{2}^{2} + \mu_{1}^{2} x^{2} - 2 \mu_{1} \mu_{2} x^{2} + \mu_{2}^{2} x^{2}} + \mu_{2}^{2} x^{2}}{8 \mu_{1}^{2} \mu_{2}^{2} + \mu_{1}^{2} x^{2} - 2 \mu_{1} \mu_{2} x^{2} + 4 \mu_{1} \mu_{2} \sqrt{4 \mu_{1}^{2} \mu_{2}^{2} + \mu_{1}^{2} x^{2} - 2 \mu_{1} \mu_{2} x^{2} + \mu_{2}^{2} x^{2}} + \mu_{2}^{2} x^{2}}}}

I’m working with sympy, so I can collect terms, or expand or whatever. Do you have some link the integration-by-parts trick? Besides xkcd: Integration by Parts ? This is how I feel about integration:
xkcd: Differentiation and Integration :)

Thanks!

hhau · February 22, 2021, 9:29am

This is probably fine. You could look at the distribution of the normalising constant for increasing values of the upper bound to check.

bnicenboim · February 22, 2021, 2:03pm

Ok, I think this is solved. I had an integer division, when I replaced the 2 by 2.0, the K is 0.00228. Which maybe shows that it doesn’t really matter.

But the integration doesn’t work with any limit besides 2999.9! I just was lucky!
With lower right limits, I get sporadic errors (rather than all the time as with > 29999), but I can’t manage to find a single limit where the integration works, even with a limit as low as 100. So there is something wrong in the integration.

Also I exposed the gamma sum with rstan and I tried to do the integration in R. I have weird results:
With a right limit of 2999.9, I get -72! But it actually works with Inf, and I get a result close to the one I get in stan with 2999.9

dens <- function(x)(unlist(lapply(x, function(x) exp(gammasum_lpdf(x, .14,.51,.21)))))
log(integrate(dens, lower =0, upper = 2999.9)[[1]])
# [1] -72.4714
log(integrate(dens, lower =0, upper = Inf)[[1]])
# [1] 0.001990042

maxbiostat · February 22, 2021, 2:20pm

Thanks. We might have to take a step back and look at the original model to see if some clever parametrisation trick is available. The function is quite unwieldy as it is, although you could try telling sympy to just collect the terms that depend on x and then maybe doing a .full_simplify().

I did some of that in the thread I linked to try and help @hhau get his integration to work, but it turned out what was really needed was updating the setup. Your situation seems different as you already have the latest setup by the looks of it.

I have two suggestions: (i) post your full model (maths) and what you’re trying to accomplish and (ii) try plotting your integrand to get a feeling of what might be going wrong.

bnicenboim · February 22, 2021, 8:04pm

I did expand collect and then simplify. (There is no full_simplify in the latest version at least).

This is the log density. In the integration, I first exponentiate this

\frac{\mu_{1} \mu_{2} \tau^{2} \left(- \log{\left(2 \pi \tau^{2} x^{2} \right)} + 2 \log{\left(\frac{\left(2 \mu_{1} \mu_{2} - \mu_{1} x + \mu_{2} x + \sqrt{4 \mu_{1}^{2} \mu_{2}^{2} + \mu_{1}^{2} x^{2} - 2 \mu_{1} \mu_{2} x^{2} + \mu_{2}^{2} x^{2}}\right)^{- \frac{1}{\tau^{2}}} \left(2 \mu_{1} \mu_{2} + \mu_{1} x - \mu_{2} x + \sqrt{4 \mu_{1}^{2} \mu_{2}^{2} + \mu_{1}^{2} x^{2} - 2 \mu_{1} \mu_{2} x^{2} + \mu_{2}^{2} x^{2}}\right)^{- \frac{1}{\tau^{2}}} e^{\frac{\sqrt{4 \mu_{1}^{2} \mu_{2}^{2} + \mu_{1}^{2} x^{2} - 2 \mu_{1} \mu_{2} x^{2} + \mu_{2}^{2} x^{2}}}{2 \mu_{1} \mu_{2} \tau^{2}}}}{\sqrt{\frac{4 \mu_{1}^{2} \mu_{2}^{2} + \mu_{1}^{2} x^{2} - 2 \mu_{1} \mu_{2} x^{2} + 2 \mu_{1} \mu_{2} \sqrt{4 \mu_{1}^{2} \mu_{2}^{2} + \mu_{1}^{2} x^{2} - 2 \mu_{1} \mu_{2} x^{2} + \mu_{2}^{2} x^{2}} + \mu_{2}^{2} x^{2}}{8 \mu_{1}^{2} \mu_{2}^{2} + \mu_{1}^{2} x^{2} - 2 \mu_{1} \mu_{2} x^{2} + 4 \mu_{1} \mu_{2} \sqrt{4 \mu_{1}^{2} \mu_{2}^{2} + \mu_{1}^{2} x^{2} - 2 \mu_{1} \mu_{2} x^{2} + \mu_{2}^{2} x^{2}} + \mu_{2}^{2} x^{2}}}} \right)}\right) + 2 \mu_{1} \mu_{2} \left(\log{\left(4 \mu_{1} \mu_{2} x^{2} \right)} + 1\right) - x \left(\mu_{1} + \mu_{2}\right)}{2 \mu_{1} \mu_{2} \tau^{2}}

This is the density (in the raw scale), that is, what needs to be integrated:

\frac{\sqrt{2} \left(2 \mu_{1} \mu_{2} - x \left(\mu_{1} - \mu_{2}\right) + \sqrt{4 \mu_{1}^{2} \mu_{2}^{2} + x^{2} \left(\mu_{1}^{2} - 2 \mu_{1} \mu_{2} + \mu_{2}^{2}\right)}\right)^{- \frac{1}{\tau^{2}}} \left(2 \mu_{1} \mu_{2} + x \left(\mu_{1} - \mu_{2}\right) + \sqrt{4 \mu_{1}^{2} \mu_{2}^{2} + x^{2} \left(\mu_{1}^{2} - 2 \mu_{1} \mu_{2} + \mu_{2}^{2}\right)}\right)^{- \frac{1}{\tau^{2}}} e^{\frac{2 \mu_{1} \mu_{2} - \mu_{1} x - \mu_{2} x + \sqrt{4 \mu_{1}^{2} \mu_{2}^{2} + \mu_{1}^{2} x^{2} - 2 \mu_{1} \mu_{2} x^{2} + \mu_{2}^{2} x^{2}} + \log{\left(2^{4 \mu_{1} \mu_{2}} \mu_{1}^{2 \mu_{1} \mu_{2}} \mu_{2}^{2 \mu_{1} \mu_{2}} x^{4 \mu_{1} \mu_{2}} \right)}}{2 \mu_{1} \mu_{2} \tau^{2}}}}{2 \sqrt{\pi} \tau x \sqrt{\frac{4 \mu_{1}^{2} \mu_{2}^{2} + 2 \mu_{1} \mu_{2} \sqrt{4 \mu_{1}^{2} \mu_{2}^{2} + x^{2} \left(\mu_{1}^{2} - 2 \mu_{1} \mu_{2} + \mu_{2}^{2}\right)} + x^{2} \left(\mu_{1}^{2} - 2 \mu_{1} \mu_{2} + \mu_{2}^{2}\right)}{8 \mu_{1}^{2} \mu_{2}^{2} + 4 \mu_{1} \mu_{2} \sqrt{4 \mu_{1}^{2} \mu_{2}^{2} + x^{2} \left(\mu_{1}^{2} - 2 \mu_{1} \mu_{2} + \mu_{2}^{2}\right)} + x^{2} \left(\mu_{1}^{2} - 2 \mu_{1} \mu_{2} + \mu_{2}^{2}\right)}}}

Unfortunately, they are totally unwieldy.

That’s the model, I’m just trying to model a process that is assumed to be a sum of random variables with gamma distribution and same coefficient of variation. (And shifted, but that’s for later).

The plot of the density is quite boring, just a right skewed distribution.

What I don’t get is why stats::integrate and pracma::integral in R just work fine with Inf? (At intermediate values stats::integrate goes a bit crazy but then it just converges at 0.001990042, pracma::integral converges to that value at x =2, and then it doesn’t change if it’s (0-2] or (0-10000] or (0,Inf) ).

martinmodrak · February 23, 2021, 8:16am

Hi,
just to have the math in a clearer view: so I assume you are following the math outlined at probability - Generic sum of Gamma random variables - Cross Validated

Note the link has different parametrization than Stan: so their k is Stan’s \alpha and theirs \theta is Stan’s 1/\beta.

First we’ll work with just \alpha and \beta and only later we’ll put in \mu and \tau - If I read the code correctly, you have:

\alpha = \frac{1}{\tau^2} \\ \beta_{\{1,2\}} = \frac{1}{\mu_{\{1,2\}} \tau ^ 2}

Note: I am lazy (and want to avoid mistakes) so I’ll use Wolfram for most of the math (code) - I also have no experience with sympy

The cumulant of a single Gamma distribution is K_{\{1,2\}}(s) = - \alpha \log(1 - \frac{s}{\beta_{\{1,2\}}}). For a sum we just add the cumulants, so we are trying to solve the saddlepoint equation for the first derivative of K(s) = K_1(s) + K_2(s), namely K^\prime(\hat{s}) = x which simplifies to:

\frac{\alpha}{\beta_1 - \hat{s} } + \frac{\alpha}{\beta_2 - \hat{s} } = x \\

which gives:

\hat{s} = \frac{1}{2}\left( (\beta_1 + \beta_2) -2 \frac{\alpha}{x} \pm \sqrt{4 \frac{\alpha^2}{x^2} + (\beta_1 - \beta_2)^2} \right)

Note that only the smaller root is actually valid, because the cumulant is defined only for s < \min \{ \beta_1,\beta_2\}.

Now the full approximation is then:

\hat{f}(x) = \frac1{\sqrt{2\pi K''(\hat{s})}} \exp(K(\hat{s}) - \hat{s} x)

with

K^{\prime\prime}(s) = \frac{\alpha}{(\beta_1 - \hat{s})^2 } + \frac{\alpha}{(\beta_2 - \hat{s})^2 }

And this provides a hint why the integration to infinity might be numerically problematic, because we have

\lim_{x \rightarrow \infty} \hat{s} = \frac{1}{2}\left( (\beta_1 + \beta_2) - |\beta_1 - \beta_2| \right) = \min \{\beta_1, \beta_2\} \\

So we are getting near the boundary where the whole thing is defined and minor inaccuracies could thus plausibly result in some over/underflow that pushes the value to the boundary and we end up taking \log(0) when evaluating K or divide by zero for the derivatives.

To be practical, I don’t think simplify is doing you a good service here. For both speed and numerical accuracy, you are likely to be better served by “condensing” (not sure what is the best word) the simplified expressions. I would actually guess that implementing the expressions mostly as I’ve written them above (after taking log and simplifying \hat{s}x to avoid 0 \times \infty as x gets large), would give you better performance. It is possible sympy has some feature to do something like this for you.

A few segments of the code seem to plausibly do weird stuff as x gets large, e.g.:

real x3 = mu_2*x;
real x4 = mu_1*x;
...
+ log(x14 - x3 + x4)^(-x1)*(x14 + x3 - x4)^(-x1)

so we are subtracting two possibly large numbers, loosing some precision where x * (mu_1 - mu_2) could work a bit better… Also it looks weird that this didn’t get translated to something like

-x1* ( log(x14 - x3 + x4) + log(x14 + x3 - x4))

Once again this is probably simplify simplifying too much… But those are just my hunches, I haven’t tried this myself.

The elephant

However, I think the elephant in the room to address first is whether this approximation actually achieves anything more than a simpler approximation (e.g. a gamma distribution with mean \mu_1 + \mu_2 and the other parameter chosen to make the variance match the variance of the sum, i.e. if I get it right to be: \frac{\tau^2}{\mu_1^2} + \frac{\tau^2}{\mu_2^2}). In my experiments with the neg. binomial, I really couldn’t find a situation where this is the case. Since gamma and neg. binomial are quite closely related, I would be surprised if the saddlepoint approximation did much useful for this case. Although the best I saw was for sum two variables.

Note that the information about the individual components is almost always almost completely erased from the sum. Which could explain why you need large treedepth and adapt_delta, as most likely only the sum mu[1] + mu[2] is identified. I would definitely try reparametrizing to have something like:

real<lower=0> mu_sum;
real<lower=0, upper=0.5> mu1_frac;

mu[1] = mu_sum * mu_1_frac;
mu[2] = mu_sum * (1 - mu_1_frac);

Best of luck with the model!

bnicenboim · February 23, 2021, 9:54am

Hi, thanks for taking a look!

Not really, I did it from scratch (with sympy), and I first converted \alpha and \beta into \mu and \tau, and then I did the saddle point approximation.

Sorry, what do you mean by “condensing”? I guess this is the key, what is the best way to express this given that x is everywhere and there is no way to avoid that? Sometimes inside square roots or logs or whatever. (BTW, I’m stubborn with sympy (or anything like it), because then I could extend this method to other operations between random variables).

But one idea that this gives me is to try every part of the integral and see what is the offending part and how to change it. I think I can export Stan’s integral function to R, right?

What I don’t understand is why R is having no trouble with the integration here, even when I use the same gammasum_lpdf from Stan? (And the output is reasonable) What is different between stats::integrate / pracma::integral and stan’s integral? And R’s output for Infinite is actually the same as Stan’s output when the right limit is a large number…

I checked that, and the saddle approximation is incredibly accurate!! Even in this case with poor priors and problems to identify \mu.

The approximation with a gamma distribution was kind of crappy, but maybe I did some mistake here:

 real gammasum_lpdf(real x, real mu_1, real mu_2, real tau) {

 real alpha = 1/tau^2;
 real beta1 = alpha/mu_1;
 real beta2 = alpha/mu_2;
 real new_mu =  alpha*(beta1+beta2);
 real new_var = alpha*(beta1^2 + beta2^2);
 return gamma_lpdf(x|new_mu^2/new_var, new_var/new_mu);
 }

yeah, that’s right, but it’s only a problem in this example, in the real application, the different mus are regressed against different predictors.

But in any case, why is it better to do it like you suggested instead of just ordering the mus?

martinmodrak · February 23, 2021, 1:04pm

Cool, but do your formula match with what I have written? It is possible I have an error somewhere, so would be great to make sure we are on the same page here…

So say you have an expression like a(b - c)^2 - let’s call this the “condensed” form (there might be a proper english term, I just don’t know it, maybe “factored”, but that feels less general than what I have in mind?), if you simplify the expression, you get: ab^2 -2abc + ac^2. So to compute the “condensed” expression we need 1 subtract, 1 square and 1 multiplication. To compute the simplified expression we need 1 plus, 1 subtract, 2 sqaures and 5 multiplications. So definitely less efficient - especially in Stan where each of those operations also adds something to the autodiff stack - the autodiff operations might be substantially more expensive than the original operations (e.g. for multiplication of two autodiff vars, the reverse pass involves two multiplications and two additions, but the main bottleneck is probably the memory access overhead involved in fetching the operands).

Additionally, if b and c are kind of large, but close to equal, then the “condensed” expression will likely do just fine numerically, while the simplified expression will involve addition and subtraction on ther order of ab^2, i.e. involving much larger number and thus much more likely to loose digits and get some catastrophic cancellation going…

I do think so, please check me, but since mean is additive always and variance of independent variables is also additive, you IMHO should have:

real new_mu = mu_1 + mu_2;
real new_var = alpha / beta1^2 + alpha / beta2^2;
//Alternative formulation, might be more efficient/stable
real new_var =  tau^2 (mu_1 ^ 2 + mu_2 ^ 2);

 // Note the flip in the second paramater
return gamma_lpdf(x|new_mu^2/new_var, new_mu/new_var);

From the doc it seems that R uses different algorithm (“Gauss–Kronrod quadrature” with some other tweaks) than Stan (the exph_sinh and tanh_sinh from Boost) - I don’t know how either of those work, but they are definitely not the same :-)

The issue does not disappear when mu s are regressed - coefficients of any shared predictors (notably the intercept) will not be well identified. So at least for the intercept, one would need special handling. What I proposed is how I would first try to implemet an intercept-only model. If there are some other shared or highly correlated predictors between the two components, I think additional care would need to be taken.

If only the sum is well identified and you order the mu you get a strong negative correlation between mu[1] and mu[2] (do you get this in your model?). In my parametrization, mu_sum will be uncorrelated with mu1_frac, which would be - in the extreme case of no information about the individual components - uniformly distributed between 0 and 0.5. So we separate the thing that’s easy to learn (the total mean) from what is hard to learn (how is it distributed into the components).

Does that make sense?

maxbiostat · February 23, 2021, 1:07pm

maybe @bbbales2 can weigh in here?

bnicenboim · February 23, 2021, 1:09pm

Ok! I’ll try the suggestions from @martinmodrak!

But in any case, it’ll be good to know if another integration function (exported from somewhere) can be used. Specially because all this is happening in the generated quantities, so I don’t need autodiff (no idea if it’s used in the integral, sorry).

martinmodrak · February 23, 2021, 1:52pm

Note that it is possible that the primary problem is not the integrator, but that your implementation of the integrand is unstable and the only correct behaviour an integrator can make is to throw an error…

To put the numeric aspects of simplified expressions into direct light (because it is fun):

a <- 1e9; b <- 1e9; c <- 1e9 + 1
a*(b - c)^2
#[1] 1e+09 - this is correct
a*b^2 - 2*a*b*c + a * c^2
#[1] -137438953472

But one does not need to use such extreme numbers to see small discrepancies

options(digits = 22)
a <- 8; b <- 5; c <- 5 + 1e-5
# The exact result is 8e-10
a*(b - c)^2  
# 7.9999999993942766e-10
a*b^2 - 2*a*b*c + a * c^2
# 7.999858553375816e-10

Anyway, numerical math is fun :-/

Good luck with further attempts!

bbbales2 · February 23, 2021, 2:47pm

The Boost integrators we used are doc’d here: Chapter 13. Quadrature and Differentiation - 1.75.0

There’s interesting stuff in there for sure. For instance there’s this on endpoints: Handling functions with large features near an endpoint with tanh-sinh quadrature - 1.75.0, but it sounds like ideally your quadrature goes to infinity anyway.

I wonder if there’s a way to use arbitrary precision math with a quadrature package in R: https://cran.r-project.org/web/packages/Rmpfr/vignettes/Rmpfr-pkg.pdf ?

Interesting example!

You should do another install_cmdstan() and upgrade to 2.26.1 but that won’t address this (Release of CmdStan 2.26.1).

bnicenboim · February 25, 2021, 9:51am

But the xc trick is not relevant in this specific case, right? The Stan manual says that xc is NaN when one of the limits is infinite.

Ok, many of the parts that look weird are in fact fine. It’s that you can do all those operations on log if you’re sure about the sign of what you are applying the log to. And the thing is that the terms that dominate change completely when x is close to 0 than when x is very large, and in many places the sign flips (and probably what variable you should collect changes) !
The thing is that x, mu, and tau are actually of the same order of magnitude for fitting the likelihood.

The silly thing is that this integral doesn’t really matter, I just tried this Rmpfr package as @bbbales2 recommended, and the integral in logscale is indeed around 0.001998003. So it’s completely unnecessary to add that to the approx. likelihood which is around .5-1…

It was a good exercise in using integration in Stan though. (And I speed up the rest of the model considerably following @martinmodrak advice, so it wasn’t for nothing).

bbbales2 · February 25, 2021, 11:52am

Glad it worked out!

bnicenboim · March 27, 2021, 8:29pm

Hi, by the way, this:

martinmodrak:

Note that the information about the individual components is almost always almost completely erased from the sum. Which could explain why you need large treedepth and adapt_delta, as most likely only the sum mu[1] + mu[2] is identified. I would definitely try reparametrizing to have something like:
real<lower=0> mu_sum;
real<lower=0, upper=0.5> mu1_frac;

mu[1] = mu_sum * mu_1_frac;
mu[2] = mu_sum * (1 - mu_1_frac);
Best of luck with the model!

speed up the model considerably! How would you extend this to a case when there is a sum of 3 gammas?

martinmodrak · March 27, 2021, 10:08pm

Quick guess:

real<lower=0> mu_sum;
real<lower=1.0 / 3.0, upper=1> mu1_frac;
real<lower=0.5, upper=1> mu2_frac;

mu[1] = mu_sum * mu1_frac;
mu[2] = mu_sum * (1 - mu1_frac) * mu2_frac;
mu[3] = mu_sum * (1 - mu1_frac) * (1 - mu2_frac);

Should span the whole space while ensuring ordering (mu[1] > mu[2] > mu[3]). But please double check if it works out…

There’s also this: Order the simplex

Topic		Replies	Views
Waldo: normal_lpdf: Location parameter is nan, but must be finite!` General	4	469	September 8, 2023
Difference of gamma distributions failing to start (many times) Modeling fitting-issues , simulation-based-calibration	7	516	October 28, 2020
Error in gamma_lpdf: shape parameter is nan Modeling	2	633	July 5, 2022
Half-normal priors for sigma (sd) in hierarquical analysis Modeling rstan , specification	2	85	February 20, 2025
Using gamma Modeling fitting-issues	11	1759	August 28, 2017

Getting the normalizing constant by integration: positive infinity doesn't work

The elephant

Related topics