Infinite k-values

Nicklaus_Millican · January 17, 2023, 12:32am

What does it mean if one, few, many, or all k-values from PSIS are infinite?

It might help if I give the model that is producing my problem. I apologize, it is in the rethinking-package format (it’s running right now so I can’t extract the Stan code; let me know if you need it, but I think it’s pretty understandable).

    LPS_concentration ~ dgamma2(mu, scale),
    
      mu <- # Basal
            exp(a_Subject[Subject] +
                a_Plate[Plate] +
                a_Sample[Sample]) +
            # Dynamic
            b_Time[LPS]*Time *
            exp(b_Time2[LPS]*Time),
  
        a_Subject[Subject] ~ dnorm(0, sigma_a_Subject),
          sigma_a_Subject ~ dexp(1),
        a_Plate[Plate] ~ dnorm(0, sigma_a_Plate),
          sigma_a_Plate ~ dexp(1),
        a_Sample[Sample] ~ dnorm(0, sigma_a_Sample),
          sigma_a_Sample ~ dexp(1),
    
        b_Time[LPS] ~ dnorm(0, sigma_b_Time),
          sigma_b_Time ~ dexp(1),
        b_Time2[LPS] ~ dnorm(0, sigma_b_Time2),
          sigma_b_Time2 ~ dexp(1),
      
      scale ~ dexp(1)
    
  ), log_lik=TRUE,
     constraints=list(b_Time="lower=0, upper=1",
                      b_Time2="lower =-1, upper=-1e-5"),
     chains=4, 
     cores=4, 
     iter=300000
)

I suspect it is something to do with the nonlinear “# Dynamic” part of the equation for mu, but I don’t know.
Also, the problem disappears when I use a normal or Student’s t instead of a gamma distribution for mu.
Another thought is that it may stem from the way in which the Rethinking package parameterizes the gamma distribution. You’ll notice that it is parameterized with mean and scale parameters rather than a and b parameters.

avehtari · January 18, 2023, 9:12am

Really big change in the posterior when an observation is removed or floating point problems in log_lik calculations. Is the posterior inference working well for this model? All convergence diagnostics and model predictions look good?

Stan code would be helpful, as the rethinking code is not showing all details

Edited by @jsocolar to fix minor typo that might have affected meaning.

Nicklaus_Millican · January 18, 2023, 8:22pm

Here is the Stan code:

data{
    int Time2[150];
    int well[150];
    vector[150] LPS_concentration;
    int Time[150];
    int LPS[150];
    int Sample[150];
    int Plate[150];
    int Subject[150];
}
parameters{
    vector[19] a_Subject;
    real<lower=0> sigma_a_Subject;
    vector[2] a_Plate;
    real<lower=0> sigma_a_Plate;
    vector[50] a_Sample;
    real<lower=0> sigma_a_Sample;
    vector<lower=0, upper=1>[2] b_Time;
    real<lower=0> sigma_b_Time;
    vector<lower =-1, upper=-1e-5>[2] b_Time2;
    real<lower=0> sigma_b_Time2;
    real<lower=0> scale;
}
model{
    vector[150] mu;
    scale ~ exponential( 1 );
    sigma_b_Time2 ~ exponential( 1 );
    b_Time2 ~ normal( 0 , sigma_b_Time2 );
    sigma_b_Time ~ exponential( 1 );
    b_Time ~ normal( 0 , sigma_b_Time );
    sigma_a_Sample ~ exponential( 1 );
    a_Sample ~ normal( 0 , sigma_a_Sample );
    sigma_a_Plate ~ exponential( 1 );
    a_Plate ~ normal( 0 , sigma_a_Plate );
    sigma_a_Subject ~ exponential( 1 );
    a_Subject ~ normal( 0 , sigma_a_Subject );
    for ( i in 1:150 ) {
        mu[i] = exp(a_Subject[Subject[i]] + a_Plate[Plate[i]] + a_Sample[Sample[i]]) + b_Time[LPS[i]] * Time[i] * exp(b_Time2[LPS[i]] * Time[i]);
    }
    LPS_concentration ~ gamma( mu/scale , 1/scale );
}
generated quantities{
    vector[150] log_lik;
    vector[150] mu;
    for ( i in 1:150 ) {
        mu[i] = exp(a_Subject[Subject[i]] + a_Plate[Plate[i]] + a_Sample[Sample[i]]) + b_Time[LPS[i]] * Time[i] * exp(b_Time2[LPS[i]] * Time[i]);
    }
    for ( i in 1:150 ) log_lik[i] = gamma_lpdf( LPS_concentration[i] | mu/scale , 1/scale );
}

I understand the idea of the k values, but in this particular model, all k values are Inf…which I don’t understand. I don’t know what is causing this, but it may be that the mean parameter mu is actually a ratio such that changes to the denominator have disproportionate effects. But besides the PSIS, all other diagnostics look good: neff high, hat = 1 over all parameters, traceplots good, predictions are reasonable…

The PSIS problem is more troublesome because I was hoping to compare fits between this model and a stripped-down model containing only the a_intercept terms. This model also shows all k = Inf. If the PSIS problem is not actually a problem but rather a glitch in the parameterization, but the glitch is not easy to fix, I was thinking I might instead use a train set and test set validation scheme between the 2 models…any thoughts?

Many thanks!

avehtari · January 23, 2023, 1:46pm

It is very unlikely to happen. Can you check that all log_lik values are finite? There can be over/underflows as computation of mu includes exp(). As gamma distribution has several different parameterizations, check also that your parameterization matches the one used by Stan 20.6 Gamma distribution | Stan Functions Reference. If these are ok, then look at the distribution of log_lik[i] for some value of i.

Nicklaus_Millican · January 28, 2023, 1:46pm

All log_lik values are finite, but with many absolutely large values (min(log_lik) ~ -41,000; max((log_lik)) ~ -2,600). However, I think the parameterization is causing problems, but I don’t know; the log_lik is log_lik[i] = gamma_lpdf( LPS_concentration[i] | mu/scale , 1/scale );. I wonder if the fact that alpha and beta are ratios is part of the problem?

LucC · January 28, 2023, 3:51pm

Just my 50 cents: for a gamma regression I usually assume the shape parameter is fixed (i.e. as measure of overdispersion) and the exponentiated linear predictor (i.e. the mean or nu in your example) feeds into the rate parameter (rate = shape / mu). Maybe there is a special reason why you choose to keep the rate fixed rather than the shape. For sure, the choice has repercussions for the likelihoods. With a fixed rate, for higher values of mu, the gamma distributions becomes more and more concentrated around the mean (because the shape parameter becomes bigger and bigger).

Nicklaus_Millican · January 28, 2023, 7:44pm

Hi LucC. I’m trying to make sure I understand what you’re saying.

Firstly, the reason I chose this parameterization is because the rethinking package that I used to code this model uses this parameterization; so I don’t really have any commitment to it other than I’d like to display the coded model in a write-up and the rethinking model is probably easier for people unfamiliar with Stan to understand…so not a critical reason.

Now, if I understand what you are saying, what is happening through the rethinking package is that I am passing it the mean (mu) and scale (scale) to the rethinking package and the package calculates shape (alpha = mu/scale) and rate (beta = 1/scale), which are passed to Stan. Due to the ratio defining the shape parameter (alpha = mu/scale), there is potential for very variable values for the shape…while may explain the high log-likelihoods and Inf k-values. This parameterization is in contrast to what you tend to prefer, which is a defined constant shape (which makes sense, something like the number of things that might be affecting my outcome variable?) and unknown parameter rate (beta). You also say that I am keeping the rate fixed, I think that scale is unknown and therefore beta=1/scale is not constant; so I may not understand what you mean and how you’re seeing that.

So could I fix this by using Stan directly, say by regressing alpha/beta on predictor variables? That only seems doable if (as you say you prefer) alpha is set to a constant value (which I suggested/asked would be the number of things I expect are affecting my outcome measure). As is, by estimating both mu and scale, I think that the rethinking package is in effect estimating both alpha and beta (rather than setting alpha to a constant). So if both parameters (either mu and scale or alpha and beta), perhaps these problems are unavoidable? (again, I don’t know, I’m only trying to figure it out). And if they are unavoidable, does that render the model useless? But if NOT rendered useless, is there an alternative way (that is, alternative to PSIS and WAIC) to compare models–say generating predictions from the models and comparing the error to actual data?

Many thanks!

LucC · January 29, 2023, 8:45am

Hi Nicklaus, you understand me correctly about the observation that it’s uncommon to fix the rate (or scale for that matter) and let shape vary as function of the mean. However, I know too little about PSIS to say anything about how the choice for this parameterisation might affect PSIS. Just pointing out that it’s a quite unusual parameterisation (to me) and I would be investigating whether a model with fixed shape and a rate (or scale) that varies with the mean behaves better. If you want to apply such a model and you’re a relative beginner with Stan, you could consider using brms.

Nicklaus_Millican · January 29, 2023, 2:00pm

Will do. Looking at it now, I can see some things I can try. Thanks LucC

Topic		Replies	Views
Waldo: normal_lpdf: Location parameter is nan, but must be finite!` General	4	312	September 8, 2023
Gamma distribution samples nan at initialization with apparently reasonable parameters Modeling fitting-issues	3	656	January 3, 2022
Divergent transitions in PK models Modeling	36	3732	May 17, 2019
MCMC issues with: Location parameter is nan, but must be finite! Modeling	6	1019	May 14, 2022
Null term for log_sum_exp() Modeling	7	283	November 23, 2023

Infinite k-values

Related Topics