Defining log_lik when using lognormal distribution

stefank · September 22, 2017, 12:10pm

Hi all,

I wanted to save the log likelihood of my model in the generated quantities block following Vehtari et al. 2017 (Stat Comput (2017) 27:1413–1432), in order to calculate LOO / WAIC in the loo package in R. It works in one of my stan models where I use a bernouilli distribution, but when I try the same with another model that uses a lognormal distribution it gives the following error, even though I defined pred, and using pred in the rest of the model works just fine. Could someone help me with defining the log_lik of this lognormal distribution?

Thank you in advance!
Stefan

data {
int<lower=0> N_obs;
int<lower=0> N_sp;
int<lower=0> species[N_obs];
real obs[N_obs];
vector[N_obs] x;
}
parameters {
real<lower=-10,upper=10> a[N_sp];
real<lower=-10,upper=10> b[N_sp];
#hyperparameters
real<lower=-10,upper=10> mu_a;
real<lower=-10,upper=10> mu_b;
real<lower=0,upper=2> sigma_a;
real<lower=0,upper=2> sigma_b;
vector<lower=0,upper=2>[N_sp] sigma_p;
}
model{
real pred[N_obs];
a ~ normal(mu_a,sigma_a);
b ~ normal(mu_b,sigma_b);
for( i in 1:N_obs ){
pred[i] = a[species[i]] + b[species[i]] * x[i];
obs[i] ~ lognormal ( pred[i], sigma_p[species[i]] )
}
generated quantities {
vector[N_obs] log_lik;
for (i in 1:N_obs){
log_lik[i] = lognormal_lpdf(obs[i] | pred[i], sigma_p[species[i]]);
}
}

SYNTAX ERROR, MESSAGE(S) FROM PARSER:
variable “pred” does not exist.
ERROR at line 97
95: vector[N_obs] log_lik;
96: for (i in 1:N_obs){
97: log_lik[i] = lognormal_lpdf(obs[i] | pred[i], sigma_p[species[i]]);
98: }

example_log_lik.stan (1.0 KB)

PsychPhD · September 22, 2017, 12:25pm

Because you’ve defined pred in the ‘model’ block, you can’t access it anywhere else. If you move the declaration and calculation for pred to the ‘transformed parameters’ block, the rest of the code should work.

Cheers,
Andrew

jonah · September 23, 2017, 6:06pm

Hi @stefank, a few comments about your model:

Are all of those constraints in the parameters block strictly necessary? The <lower=0> is definitely necessary for all of the sigmas since they can’t be negative but the constraints on a, b, mu_a, mu_b, and the upper bounds on sigma_a, sigma_b, and sigma_p are definitely not recommended unless they are logically required. Instead we recommend using soft constraints via weakly informative priors. Two resources that might be helpful are our Prior Choice Recommendations wiki and the case study about weakly informative priors.
If you declare a and b as vectors instead of real arrays you can use element wise multiplication in the model block and vectorize your likelihood (example code below).
You can use transformed parameters like @PsychPhD recommended, but if you don’t want to save all the values of pred (you probably don’t need them after they’re used) then I would avoid making it a transformed parameter. See the code below for another option.

data {
  int<lower=0> N_obs; 
  int<lower=0> N_sp;	
  int<lower=0> species[N_obs];
  real obs[N_obs]; 
  vector[N_obs] x;
}
parameters {
  // don't use hard constraints (except lower bound on sigma)
  vector[N_sp] a;
  vector[N_sp] b;	
  real mu_a;
  real mu_b;
  real<lower=0> sigma_a;
  real<lower=0> sigma_b;
  vector<lower=0>[N_sp] sigma_p;	
}
model{
  // vectorized likelihood
  obs ~ lognormal(a[species] + b[species] .* x, sigma_p[species]);
  
  // If you get warnings about divergences you may want to consider 
  // using the non-centered parameterization for these normal priors     
  // (see http://mc-stan.org/users/documentation/case-studies/divergences_and_bias.html)
  a ~ normal(mu_a, sigma_a);
  b ~ normal(mu_b, sigma_b); 
  
  // add priors for the mus and sigmas that softly imply the 
  // contraints you were using in the parameters block, e.g. mu_a ~ normal(0, 5)
}
generated quantities {
  vector[N_obs] log_lik;
  for (i in 1:N_obs){
    real pred = a[species[i]] + b[species[i]] * x[I];  // temporary variable
    log_lik[i] = lognormal_lpdf(obs[i] | pred, sigma_p[species[i]]);
  }
}

Hope that helps.

stefank · November 2, 2017, 3:09pm

Thanks for your suggestions, this helps me improve my model definitions in general. One question: where would the priors, e.g. mu_a ~ normal(0, 5), be defined? At the top of the model block for example, before mu_a is used? And how would you define the sigma priors, as they cannot be negative I assume they should have that reflected in the prior?

Bob_Carpenter · November 2, 2017, 8:21pm

You can define them anywhere. All of the parameters are always provided externally by the algorithm doing the fitting. They’re not set with sampling statements—a sampling statement just increments the log density.

Stan will automatically truncate if you declare the variable with <lower = 0>. Then if you do something like this:

real<lower = 0> sigma;

sigma ~ normal(0, 1);

you get a half-normal prior. But, if the parameters of the prior are themselves parameters of the model, you need to explicitly truncate because the normalization term isn’t constant, as in

sigma ~ normal(mu, tau) [0, ];

Or you can use a properly positive-only prior, like a gamma or lognormal, but those go to zero density as the value goes to zero, so they behave a bit differently than the half-normal.

Topic		Replies	Views
Log_lik for LOO vs lp__ Modeling loo	6	1947	October 3, 2017
CJS log likelihood Modeling loo	2	775	May 13, 2020
How to calculate log_lik in generated quantities of a multivariate regression model CmdStan cmdstan , loo	15	3710	March 9, 2022
How to calculate log_lik in generated quantities of two different process(models) running simultaneously in a longitudinal data Modeling rstan , loo	12	585	April 18, 2024
Fit statistics — troubles calculating log likelihood? Modeling specification , loo	6	2044	October 16, 2017

Defining log_lik when using lognormal distribution

Related topics