How to calculate the discrete likelihood in stan

I am writing likelihood within the formula in an aritcle, but it does not work, could someone tell me where is the issue

model {
  for (n in 1:N_w) {
    array[N_w, Z_ILI[n]] real l;
    for (z in 1:Z_ILI[n]) {
      l[n][z] = 0;
      real log_cdf_upper = neg_binomial_2_lcdf(Z_ILI[n] | m_ILIw[n] * rho_a+0.00001, phi_ILI);
      real qpossionlikelihood = neg_binomial_2_lpmf(z | m_ILIw[n] * rho_a+0.00001, 
      phi_ILI) - log_cdf_upper;
      real binomiallikelihood = binomial_lpmf(n_ILIp[n] | n_ILI[n], z*0.9999/Z_ILI[n]);
      l[n][z] = exp(qpossionlikelihood) * exp(binomiallikelihood);
    }
    print(l[n]);
    target += log(sum(l[n]));
  }
} 

屏幕截图 2024-07-16 221046

Any hints oas to what’s going wrong?

Maybe you could unpack that notation. What is the three-argument “Bin” function doing and is “nBin” a different function? If it’s negative binomial, which of the bajillion parameterizations is “nBin” assuming? Is v_i^{+} etc. some kind of array or function?

This is going to be a problem. The exp() is likely to underflow. It’s better to do the following, which won’t underflow.

logl[n][z] = qpoisson_likleihood + binomial_likelihood;
...
target += log_sum_exp(logl[n]);

Also, “poisson” is misspelled and I’d highly recommend _ for separating words.

You also don’t need to set l[n][z] = 0 before resetting it.

Fudging things to positive with +0.00001 is almost never a good idea. If you’re having problems where things underflow, there are other issues with your model that this kind of thing won’t solve.

Thank you for your response; it has been quite beneficial. However, I’ve noticed that the computation of l[n] is quite time-consuming, is there any algorithm to accelerate it? In the context of negative binomial and binomial distributions, the parameter z serves as an auxiliary variable that is used to discrete the likelihood function. The terms v_i^+ , v_i and Z_i represent the data, \zeta_i represents the model output, while ψ denotes the overdispersion parameter."

model {
  E_0 ~ uniform(10, Y_C[1] * 10);
  beta_C ~ normal(1, 0.5);
  R0 ~ gamma(2, 1);
  beta_fixed ~ gamma(2, 1);
  alpha_init ~ gamma(2.5, 5); // C95 for rho(t): 0.08312116 1.28325020
  phi_C ~ normal(50, 10);
  phi_ILI ~ uniform(0, 100);
  rho_a ~ beta(1, 5);
  Y_C ~ neg_binomial_2(m_c * rho_C, phi_C);
  for (n in 1:N_w) {
    array[N_w, Z_ILI[n]] real logl;
    for (z in 1:Z_ILI[n]) {
      real log_cdf_upper = normal_lcdf(Z_ILI[n] | m_ILIw[n] * rho_a, 
      sqrt(m_ILIw[n] * rho_a + (m_ILIw[n] * rho_a)^2/phi_ILI));
      real nbinomia_likelihood = normal_lpdf(z | m_ILIw[n] * rho_a, 
      sqrt(m_ILIw[n] * rho_a + (m_ILIw[n] * rho_a)^2/phi_ILI)) - log_cdf_upper;
      real binomia_llikelihood = binomial_lpmf(n_ILIp[n] | n_ILI[n], z*0.9999/Z_ILI[n]);
      logl[n][z] = nbinomia_likelihood + binomial_likelihood;
    }
    target += log_sum_exp(logl[n]);
    // print(target());
  }
  Ysero_p ~ binomial(sero_ss, R[sero_d]/N);
}

I don’t see a way to make this a lot faster. The best I can see to do is to not repeat calculations like sqrt(m_ILIw[n] * rho_a + (m_ILIw[n] * rho_a)^2/phi_ILI) because they’re relatively expensive. You can also vectorize some of these operations,

vector[N_w] foo = sqrt(m_ILIw * rho_a + (m_ILIw * rho_a)^2 / phi_ILI)

and then access foo[n]. It should be a bit faster in vectorized form.’

then you have terms like log_cdf_upper than can be pulled up out of the z loop because they don’t depend on z.

Hacks like z * 0.9999 are almost always a bad idea if they were introduced to patch a problem with a model hitting a boundary. You want to use a prior or a better model and avoid boundaries of parameter space in estimation.

Thank you for your patience! I have rescaled the size of Z_ILI[n], and now the code runs much faster, just as I had hoped. Additionally, your suggestion on vectorization was incredibly insightful and helpful—many thanks!