How can i transform a parameter, while still needing to do model comparison?

In my model, It works well with simulated data that is within healthy ranges. However, i obtained some real data, which is action count data, for instance the number of times someone interacts with a question on the computer. Some of the questions in this data are severely overdispersed, as some people are writing multiple paragraphs with 500-800 actions, while some people only do a few like 1-5, causing my dispersion parameter nu to be extremely low.

I can use the negative binomial model, and it works fine, but i am trying to use a conway maxwell poisson model that could also potentially handle underdispersion at the same time. However, there is a normalizing constant here that struggles when nu is very low (overdispersion). Is there something i can do to transform nu such that the estimation runs smoothly, but i can also do model comparison still with simpler models, like one that is just based on the poisson distribution? For example, if i took the exp(nu) this would leave nu in a healthy range(above .3). Whenever i do a transformation, the estimation runs smoothly now but the log_lik indicates a problem when i go to do model comparison as LOO heavily prefers the “wrong” model, so i suspect i was doing something wrong.

functions{

  real approximation(real log_lambda, real nu){
  
      real log_mu = log_lambda / nu;
      real nu_mu = nu * exp(log_mu);
      real nu2 = nu^2;
      // first 4 terms of the residual series
      real log_sum_resid = log1p(
        nu_mu^(-1) * (nu2 - 1) / 24 +
        nu_mu^(-2) * (nu2 - 1) / 1152 * (nu2 + 23) +
        nu_mu^(-3) * (nu2 - 1) / 414720 * (5 * nu2^2 - 298 * nu2 + 11237)

      );
      return nu_mu + log_sum_resid  -
            ((log(2 * pi()) + log_mu) * (nu - 1) / 2 + log(nu) / 2);  
  
  }
  real summation(real log_lambda, real nu){
  
    real z = negative_infinity();
    real z_last = 0;
    //real b = 0;
    
    for (j in 0:200) {
      //b = b + 1;
      z_last = z;
      z = log_sum_exp(z, j * log_lambda  - nu * lgamma(j+1));

      if ((abs(z - z_last) < .001)) {
        break;
  
      }

    }

    
    return z;
  }
  
  
  real partial_sum(array[] int slice_n, int start, int end,
                   array[] int y_slice,
                   array[] int jj_slice, array[] int ii_slice, vector theta,
                   vector beta, vector nu, vector alpha, array[] int kk_slice, vector gamma) {
    real partial_target = 0.0;
    
    for (n in start : end) {
      
      real log_lambda = nu[ii_slice[n]]*(alpha[ii_slice[n]]*theta[jj_slice[n]]+ beta[ii_slice[n]] + gamma[kk_slice[n]]);
       real log_prob = 0;
      
      
      if (log_lambda / nu[ii_slice[n]] > log(1.5) && log_lambda > log(1.5)) {
        log_prob = y_slice[n] * log_lambda -
                   nu[ii_slice[n]] * lgamma(y_slice[n] + 1) -
                   approximation(log_lambda, nu[ii_slice[n]]);
      } else {
        log_prob = y_slice[n] * log_lambda -
                   nu[ii_slice[n]] * lgamma(y_slice[n] + 1) -
                   summation(log_lambda, nu[ii_slice[n]]);
      }
      
      partial_target += log_prob;
    }
    return partial_target;
  }
}

data {
  int<lower=0> I;
  int<lower=0> J;
  int<lower=1> N;
  int<lower=1> K;
  array[N] int<lower=1, upper=I> ii;
  array[N] int<lower=1, upper=J> jj;
  array[N] int<lower=1, upper=K> kk;
  array[I] int<lower=1,upper=K> item_type_for_beta;
  array[N] int<lower=0> y;
  array[N] int seq_N;
  int<lower=0> grainsize;
}
parameters {
  vector[J] theta;
  vector[I] beta;
  vector[K] gamma;
  vector<lower=0>[I] nu; 
  vector<lower=0>[K] sigma_beta_k;
  
  vector<lower=0>[I] alpha;

  real<lower=0> sigma_alpha;
  real<lower=0> mu_alpha;
  
  real<lower=0> mu_gamma;
  real<lower=0> sigma_gamma;
  
  real<lower=0> mu_nu;
  real<lower=0> sigma_nu;
}
model {
  mu_nu ~ normal(0,10);
  sigma_nu ~ normal(0,5);
  nu ~ normal(mu_nu,sigma_nu);
  
  sigma_beta_k ~ normal(0,2);

  for (i in 1:I) {
    beta[i] ~ normal(gamma[item_type_for_beta[i]], sigma_beta_k[item_type_for_beta[i]]);
  }
  
  theta ~ normal(0, .3); 
  
  mu_alpha ~ normal(0,5);
  sigma_alpha ~ normal(0,2);
  alpha ~ normal(mu_alpha, sigma_alpha);
  
  mu_gamma ~ normal(0,10);
  sigma_gamma ~ normal(0,5);
  gamma ~ normal(mu_gamma, sigma_gamma);

  target += reduce_sum(partial_sum, seq_N, grainsize, y, jj, ii, theta,
                       beta, nu, alpha, kk, gamma);
}
generated quantities {
  array[N] real log_lik;

  for (n in 1:N) {
      real log_lambda = nu[ii[n]] * (alpha[ii[n]] * theta[jj[n]] + beta[ii[n]] + gamma[kk[n]]);
      if (log_lambda / nu[ii[n]] > log(1.5) && log_lambda > log(1.5)) {
          log_lik[n] = y[n] * log_lambda - nu[ii[n]] * lgamma(y[n] + 1) - approximation(log_lambda, nu[ii[n]]);
      } else {
          log_lik[n] = y[n] * log_lambda - nu[ii[n]] * lgamma(y[n] + 1) - summation(log_lambda, nu[ii[n]]);
      }
  }
}

Hi, @halo2masterq: Sorry nobody’s answered your question, but things like this with big and complicated models are hard to diagnose.

Do you actually have underdispersed data? The reason I ask is that more complex models can be harder to fit, so you might only want to bring them out when there’s a definite need.

As to loo preferring the “wrong” model, are you sure you’re calculating the likelihood correctly? It looks like you branch between an approximation, but they both look like you’re calculating log likelihoods by hand using a formula.

This is the most likely cause of the problem. It may also be that your’e not able to fit the fancy model—so you should be applying diagnostics to make sure you’re fitting OK.

You might find it easier removing the parallelization and cutting down data sizes to debug.

I’d also suggest getting the indentation aligned and removing the extra vertical whitespace to make it easier to read the code.

You’re using a centered parameterization for the hierarchical model:

  nu ~ normal(mu_nu,sigma_nu);

This can be really bad geometrically unless there’s a lot of data informing nu. It will probably work better if you declare nu as follows:

vector<offset=mu_nu, multiplier=sigma_nu> nu;

This will convert to a non-centered parameterization of nu. You’ll have to declare it after mu_nu and sigma_nu for this to work. You have the same issue for gamma.

This can be vectorized to this, which I feel makes the code more readable (and maybe a bit more efficient):

beta ~ normal(gamma[item_type_for_beta], sigma_beta_k[item_type_for_beta]);

For clarification, gamma represents an “item type” or the type of a question, such as multiple choice. What seems to be happening is that the item type that is heavily overdispersed (open ended item type) pushes nu to a low number, and in turn gamma for that item type compensates to ~50. The data is definitely overdispersed, as some people are performing hundreds/thousands of actions, while some people are performing 0 or single digits.

I’m not sure if gamma is increasing too much, and then nu is compensating, or if nu is low and therefore gamma compensates.

The item types (gamma) that are not heavily overdispersed are roughly 2 for reference.

I can estimate gamma at 50 no problem, but the functions I am using for an infinite sum in the pmf do not work when nu is extremely small. Ideally, I’d be able to reparameterize nu somehow so that the scale of it keeps it .3 or higher, like taking exp(nu), but I’m not sure this worked well for me. There seems to be some odd interaction between nu and gamma trading off with eachother. In other words, gamma for the open ended item type should be roughly 3-5, not 50, but nu being so low forces it to increase to compensate I guess.

I can try your suggestions here and see how that turns out. Thanks!

edit:

transformed parameters {
  vector<lower=0>[I] nu = exp(log_nu); 
}
model {
  mu_log_nu ~ normal(0, 10);
  sigma_log_nu ~ cauchy(0, 5);
  log_nu ~ normal(mu_log_nu, sigma_log_nu); 
  
  sigma_beta_k ~ normal(0, 5);

 beta ~ normal(gamma[item_type_for_beta], sigma_beta_k[item_type_for_beta]);
  
  theta ~ normal(0, 0.3);
  
  alpha ~ normal(0, 5);
  
  mu_gamma ~ normal(0, 10);
  sigma_gamma ~ cauchy(0, 5);
  gamma ~ normal(mu_gamma, sigma_gamma);
  
  target += reduce_sum(partial_sum, seq_N, grainsize, y, jj, ii, theta,
                       beta, nu, alpha, kk, gamma);
}

generated quantities {
  array[N] real log_lik;
  for (n in 1:N) {
    real log_lambda = nu[ii[n]] * (alpha[ii[n]] * theta[jj[n]] + beta[ii[n]] + gamma[kk[n]]);
    if (log_lambda / nu[ii[n]] > log(1.5) && log_lambda > log(1.5)) {
      log_lik[n] = y[n] * log_lambda - exp(nu[ii[n]]) * lgamma(y[n] + 1) - approximation(log_lambda, nu[ii[n]]);
    } else {
      log_lik[n] = y[n] * log_lambda - exp(nu[ii[n]]) * lgamma(y[n] + 1) - summation(log_lambda, nu[ii[n]]);
    }
  }
}


When i transform nu into exp(nu), the estimation runs smoothly but it seems that the resulting log_lik is changed. Is there a way i can transform nu? i assumed the other parameters would adjust to give roughly the same log_lik values.

You have defined log_nu, which you can then transform with exp() to get a lognormal distribution on nu if that’s what you want.

I’d expect the likelihood changes if you use exp(log_nu) in place of log_nu or exp(nu) in place of nu.

You can transform and use variables without a Jacobin adjustment, but if you put a distribution on them, you need to deal with the change of variables to get the right answer.