Write log_lik for hierarchical logit fit

Hello,

I have a hierarchical model to fit an inverse logit function into data (example bellow) and I would like to add the generate quantities block to be able to compare with other models using WAIC, LOO-IC. I have tried this:

generated quantities {
  // For log likelihood calculation
  real log_lik[N];
  
  for (n in 1:J) {
    log_lik[n] += normal_lpdf(y[n] | PSE[n]-inv_logit(steepness[n]*(GS-inflection[n])), sigma);
  }

}

which gives me the following problem:

C:\Users\AppData\Local\Continuum\anaconda3\lib\site-packages\arviz\stats\stats.py:839: RuntimeWarning: invalid value encountered in greater
  (tailinds,) = np.where(x > xcutoff)  # pylint: disable=unbalanced-tuple-unpacking
C:\Users\AppData\Local\Continuum\anaconda3\lib\site-packages\arviz\stats\stats.py:683: UserWarning: Estimated shape parameter of Pareto distribution is greater than 0.7 for one or more samples. You should consider using a more robust model, this is because importance sampling is less likely to work well if the marginal posterior and LOO posterior are very different. This is more likely to happen with a non-robust model and highly influential observations.
  "Estimated shape parameter of Pareto distribution is greater than 0.7 for "

What is the problem?
Also, what is the difference between normal_log and normal_lpdf? (the normal_log is used here)

The entire code goes like this:

data{
  int<lower=0> N;               // number of all datapoints
  int<lower=0> J;               // number of subjects
  vector[N] y;
  vector[N] GS;
  int subject[N]; // labels for subjects
}


parameters{
  // all parameters vary by subject
  vector[J] PSE;
  vector[J] steepness;
  vector[J] inflection;
  real mu_pse;
  real mu_steepness;
  real mu_inflection;
  
  real<lower=0> sigma;          // residual error of the observations, need to be positive
  real<lower=0> sigma_pse;       
  real<lower=0> sigma_steepness; 
  real<lower=0> sigma_inflection; 
}

model{
  mu_pse ~ normal(0.5, 1);
  //mu_steepness ~ exponential(0.2); // gamma(7, 0.3) this is on a much tighter scale than your guess in the first post
  //mu_inflection ~ gamma(7, 0.15);
  mu_steepness ~ normal(2,1);
  mu_inflection ~ normal(0.5,1);
  
  PSE ~ normal(mu_pse, sigma_pse);
  steepness ~ normal(mu_steepness, sigma_steepness);
  inflection ~ normal(mu_inflection, sigma_inflection);
  
  y ~ normal(PSE[subject]-inv_logit(steepness[subject].*(GS-inflection[subject])), sigma);
}

generated quantities {
  // For log likelihood calculation
  real log_lik[N];
  
  for (n in 1:J) {
    log_lik[n] += normal_lpdf(y[n] | PSE[n]-inv_logit(steepness[n]*(GS-inflection[n])), sigma);
  }

}
Data:
GS = np.array([0.  , 0.33, 0.6 , 0.8 , 1.  , 1.33])
y = np.array([[0.4 , 0.  , 0.2 , 0.6 , 0.2 , 0.2 ],
       [0.5 , 0.5 , 0.5 , 0.6 , 0.4 , 0.25],
       [0.6 , 0.  , 0.  , 0.2 , 0.  , 0.  ],
       [0.2 , 0.2 , 1.  , 0.6 , 0.2 , 0.2 ],
       [0.6 , 0.  , 0.  , 0.  , 0.  , 0.  ],
       [1.  , 0.6 , 0.2 , 0.4 , 0.4 , 0.  ],
       [0.2 , 0.4 , 0.  , 0.2 , 0.2 , 0.  ],
       [0.  , 0.  , 0.  , 0.  , 0.  , 0.4 ],
       [0.8 , 0.2 , 0.8 , 0.2 , 0.4 , 0.6 ],
       [0.2 , 0.2 , 0.  , 0.2 , 0.  , 0.  ],
       [0.4 , 0.2 , 0.2 , 0.2 , 0.  , 0.  ],
       [0.  , 0.  , 0.4 , 0.4 , 0.2 , 0.  ],
       [1.  , 0.  , 0.  , 0.  , 0.  , 0.  ],
       [0.6 , 0.  , 0.4 , 0.  , 0.2 , 0.  ],
       [0.6 , 0.6 , 0.2 , 0.2 , 0.4 , 0.6 ],
       [0.8 , 0.2 , 0.2 , 0.2 , 0.  , 0.4 ],
       [0.8 , 0.4 , 0.2 , 0.  , 0.6 , 0.2 ],
       [0.4 , 0.2 , 0.2 , 0.  , 0.2 , 0.  ],
       [0.5 , 0.2 , 0.  , 0.  , 0.2 , 0.2 ],
       [0.5 , 0.  , 0.4 , 0.  , 0.  , 0.  ],
       [0.8 , 0.4 , 0.  , 0.  , 0.  , 0.  ],
       [0.2 , 0.2 , 0.2 , 0.2 , 0.2 , 0.  ],
       [0.4 , 0.6 , 0.6 , 0.2 , 0.6 , 0.  ],
       [0.2 , 0.6 , 0.4 , 0.4 , 0.  , 0.  ]])
1 Like

The problem is stated in the warning message: "C:\Users\AppData\Local\Continuum\anaconda3\lib\site-packages\arviz\stats\stats.py:683: UserWarning: Estimated shape parameter of Pareto distribution is greater than 0.7 for one or more samples. You should consider using a more robust model, this is because importance sampling is less likely to work well if the marginal posterior and LOO posterior are very different. This is more likely to happen with a non-robust model and highly influential observations.” See more in LOO glossary and CV-FAQ.

normal_log is deprecated form and you should use normal_lpdf. That link is pointing to ver old vrrsion. The problem with these web sites collecting pdfs us that they don’t update when there is a new version. See instead this version [1507.04544] Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC which has been published as Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC | Statistics and Computing

1 Like

Thanks for the answer and links, so how do I fix it? :) In other words, what do you mean by more robust model?

1 Like

EDIT: added direct links for the relevant case studies.

See case studies

which have similar structure as your model. If you are using rstan and the latest loo package, you can try also moment matching loo.

1 Like