Write log_lik for hierarchical logit fit

My_Work · September 5, 2020, 1:42pm

Hello,

I have a hierarchical model to fit an inverse logit function into data (example bellow) and I would like to add the generate quantities block to be able to compare with other models using WAIC, LOO-IC. I have tried this:

generated quantities {
  // For log likelihood calculation
  real log_lik[N];
  
  for (n in 1:J) {
    log_lik[n] += normal_lpdf(y[n] | PSE[n]-inv_logit(steepness[n]*(GS-inflection[n])), sigma);
  }

}

which gives me the following problem:

C:\Users\AppData\Local\Continuum\anaconda3\lib\site-packages\arviz\stats\stats.py:839: RuntimeWarning: invalid value encountered in greater
  (tailinds,) = np.where(x > xcutoff)  # pylint: disable=unbalanced-tuple-unpacking
C:\Users\AppData\Local\Continuum\anaconda3\lib\site-packages\arviz\stats\stats.py:683: UserWarning: Estimated shape parameter of Pareto distribution is greater than 0.7 for one or more samples. You should consider using a more robust model, this is because importance sampling is less likely to work well if the marginal posterior and LOO posterior are very different. This is more likely to happen with a non-robust model and highly influential observations.
  "Estimated shape parameter of Pareto distribution is greater than 0.7 for "

What is the problem?
Also, what is the difference between normal_log and normal_lpdf? (the normal_log is used here)

The entire code goes like this:

data{
  int<lower=0> N;               // number of all datapoints
  int<lower=0> J;               // number of subjects
  vector[N] y;
  vector[N] GS;
  int subject[N]; // labels for subjects
}


parameters{
  // all parameters vary by subject
  vector[J] PSE;
  vector[J] steepness;
  vector[J] inflection;
  real mu_pse;
  real mu_steepness;
  real mu_inflection;
  
  real<lower=0> sigma;          // residual error of the observations, need to be positive
  real<lower=0> sigma_pse;       
  real<lower=0> sigma_steepness; 
  real<lower=0> sigma_inflection; 
}

model{
  mu_pse ~ normal(0.5, 1);
  //mu_steepness ~ exponential(0.2); // gamma(7, 0.3) this is on a much tighter scale than your guess in the first post
  //mu_inflection ~ gamma(7, 0.15);
  mu_steepness ~ normal(2,1);
  mu_inflection ~ normal(0.5,1);
  
  PSE ~ normal(mu_pse, sigma_pse);
  steepness ~ normal(mu_steepness, sigma_steepness);
  inflection ~ normal(mu_inflection, sigma_inflection);
  
  y ~ normal(PSE[subject]-inv_logit(steepness[subject].*(GS-inflection[subject])), sigma);
}

generated quantities {
  // For log likelihood calculation
  real log_lik[N];
  
  for (n in 1:J) {
    log_lik[n] += normal_lpdf(y[n] | PSE[n]-inv_logit(steepness[n]*(GS-inflection[n])), sigma);
  }

}

Data:

GS = np.array([0.  , 0.33, 0.6 , 0.8 , 1.  , 1.33])
y = np.array([[0.4 , 0.  , 0.2 , 0.6 , 0.2 , 0.2 ],
       [0.5 , 0.5 , 0.5 , 0.6 , 0.4 , 0.25],
       [0.6 , 0.  , 0.  , 0.2 , 0.  , 0.  ],
       [0.2 , 0.2 , 1.  , 0.6 , 0.2 , 0.2 ],
       [0.6 , 0.  , 0.  , 0.  , 0.  , 0.  ],
       [1.  , 0.6 , 0.2 , 0.4 , 0.4 , 0.  ],
       [0.2 , 0.4 , 0.  , 0.2 , 0.2 , 0.  ],
       [0.  , 0.  , 0.  , 0.  , 0.  , 0.4 ],
       [0.8 , 0.2 , 0.8 , 0.2 , 0.4 , 0.6 ],
       [0.2 , 0.2 , 0.  , 0.2 , 0.  , 0.  ],
       [0.4 , 0.2 , 0.2 , 0.2 , 0.  , 0.  ],
       [0.  , 0.  , 0.4 , 0.4 , 0.2 , 0.  ],
       [1.  , 0.  , 0.  , 0.  , 0.  , 0.  ],
       [0.6 , 0.  , 0.4 , 0.  , 0.2 , 0.  ],
       [0.6 , 0.6 , 0.2 , 0.2 , 0.4 , 0.6 ],
       [0.8 , 0.2 , 0.2 , 0.2 , 0.  , 0.4 ],
       [0.8 , 0.4 , 0.2 , 0.  , 0.6 , 0.2 ],
       [0.4 , 0.2 , 0.2 , 0.  , 0.2 , 0.  ],
       [0.5 , 0.2 , 0.  , 0.  , 0.2 , 0.2 ],
       [0.5 , 0.  , 0.4 , 0.  , 0.  , 0.  ],
       [0.8 , 0.4 , 0.  , 0.  , 0.  , 0.  ],
       [0.2 , 0.2 , 0.2 , 0.2 , 0.2 , 0.  ],
       [0.4 , 0.6 , 0.6 , 0.2 , 0.6 , 0.  ],
       [0.2 , 0.6 , 0.4 , 0.4 , 0.  , 0.  ]])

avehtari · September 5, 2020, 2:26pm

The problem is stated in the warning message: "C:\Users\AppData\Local\Continuum\anaconda3\lib\site-packages\arviz\stats\stats.py:683: UserWarning: Estimated shape parameter of Pareto distribution is greater than 0.7 for one or more samples. You should consider using a more robust model, this is because importance sampling is less likely to work well if the marginal posterior and LOO posterior are very different. This is more likely to happen with a non-robust model and highly influential observations.” See more in LOO glossary and CV-FAQ.

normal_log is deprecated form and you should use normal_lpdf. That link is pointing to ver old vrrsion. The problem with these web sites collecting pdfs us that they don’t update when there is a new version. See instead this version [1507.04544] Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC which has been published as Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC | Statistics and Computing

My_Work · September 5, 2020, 2:34pm

Thanks for the answer and links, so how do I fix it? :) In other words, what do you mean by more robust model?

avehtari · September 5, 2020, 2:38pm

EDIT: added direct links for the relevant case studies.

See case studies

which have similar structure as your model. If you are using rstan and the latest loo package, you can try also moment matching loo.

Topic		Replies	Views
Defining log_lik when using lognormal distribution Modeling techniques , specification , loo	4	2200	November 2, 2017
Log_lik for LOO vs lp__ Modeling loo	6	1943	October 3, 2017
How to calculate log_lik in generated quantities of a multivariate regression model CmdStan cmdstan , loo	15	3692	March 9, 2022
Model Assessment for Variational Inference in `cmdstanr` CmdStan cmdstanr , validation	1	360	November 1, 2023
How can I acquire the log likelihood from "loo"? Modeling loo	10	579	June 17, 2021

Write log_lik for hierarchical logit fit

Related topics