Why different priors produce very large results

yanmin · December 19, 2024, 2:09pm

I replicate a paper titled ’ Bayesian hierarchical models for linear networks’(J Appl Stat
. 2020 Dec 29;49(6):1421-1448).
The model is specified,
y_i \sim poisson(L_i* exp(lambda_i)) \\ lambda_i \sim N(\mu,\sigma^2) \\ \mu \sim N(\alpha, \tau^2) \\ \sigma^2 \sim Inverse-Gamma(\alpha_0,\beta_0)

The R codes is as follows,

library(dplyr);library(purrr);ibrary(rstan)

#// path = 'https://pmc.ncbi.nlm.nih.gov/articles/PMC9042008/pdf/CJAS_49_1864814.pdf'
path = 'D:\\bayes\\Bayesian hierarchical models for linear networks.pdf'
df = tabulapdf::extract_tables(path,pages = 4)

accident = df[[1]] %>% select(2:3) 
names(accident) = c('length','accidents')
data_for_Stan = list(N = length(accident$accidents),
                      length = accident$length, 
                      accidents = accident$accidents)

The first stan file:

conjugate <- "
data {
  int<lower=0> N;
  vector[N] length;
  array[N] int accidents;
}

transformed data{
  vector[N] log_len;
  log_len = log(length);
}

parameters {
  real mu;
  real<lower=0> sigma;
  vector<offset = mu, multiplier = sigma>[N] lambda; 
}

transformed parameters {
  real<lower=0> sigma2 = sigma^2;
}

model {
  mu ~ normal(-6.65,0.09);
  sigma2 ~ inv_gamma(18.36, 58.06);
  lambda ~ normal(mu,sigma);
  for (n in 1:N) {
  accidents[n] ~ poisson_log(lambda[n] + log_len[n]);
  }
}

The second model differ in the hyperparameters for \mu and \sigma^2

model {
  mu ~ normal(0,10);
  sigma2 ~ inv_gamma(0.001,0.001);
  lambda ~ normal(mu,sigma);
  for (n in 1:N) {
  accidents[n] ~ poisson_log(lambda[n] + log_len[n]);
  }

}

The first model have the output:mean(\mu) = -6.55, mean(\sigma) = 5.07, but the second have the output: mean(\mu) = 1.057, mean(\sigma) = 0.602.
Why different priors produce so large results? Is the results acceptable?

sjp · December 19, 2024, 2:36pm

I only have a little experience coding directly in Stan, so others can correct me if I’m wrong, but I think the prior on \mu in the first model is practically fixed to -6.55 given that the SD is so small (0.09). The priors for \sigma are also different, but even if they weren’t, fixing a value of one parameter like this with affect the samples of others.

Whether or not priors are feasible generally needs some theory. But it looks like the data and the priors are giving different information as to what plausible values of \mu are.

mitzimorris · December 19, 2024, 2:59pm

The Stan User’s Guide has a section on how to code and run prior predictive checks: Posterior and Prior Predictive Checks

Otherwise, the first response is correct - the difference between:

mu ~ normal(-6.65,0.09);

and

mu ~ normal(-0, 10);

is that in the first case, you’re pretty much constraining to mu to be -6.65, in the 2nd case, you have a very diffuse prior on mu, centered on zero. guidance on prior choice is available here: Prior Choice Recommendations · stan-dev/stan Wiki · GitHub

yanmin · December 26, 2024, 2:19am

i would like to perform prior predictive checks for this sample, my codes are as follows,

  data {
    int<lower=0> N;
    vector[N] length;
  }
  generated quantities {
    real mu_ppc = normal_rng(-6.65,0.09);
    real sigma2_ppc = inv_gamma_rng(18.36, 58.06);
    
    vector[N] lambda_ppc;
    array[N] int y_sim;
    
    for (i in 1:N) lambda_ppc[i] = normal_rng(mu_ppc,sigma2_ppc);
    y_sim = poisson_log_rng(exp(lambda_ppc) .* length);
  }

'

the message is as follows,
SAMPLING FOR MODEL ‘anon_model’ NOW (CHAIN 1).
[1] “Error : Must use algorithm="Fixed_param" for model that has no parameters.”
[1] “error occurred during calling the sampler; sampling not done”

how to solve the issue? any help is appreciated.

frank_hezemans · December 26, 2024, 9:38am

Since your Stan code doesn’t have a parameters block (I.e., your model doesn’t have any latent parameters), you need to use the algorithm=“Fixed_param” argument for rstan.

If you’re using cmdstanr, you’d have to use the fixed_param = TRUE argument in the call to sample.

mitzimorris · December 26, 2024, 2:59pm

that extremely annoying fixed_param problem has finally been addressed in the latest release of Stan - Release v2.36.0 (10 December 2024) · stan-dev/cmdstan · GitHub

yanmin · December 26, 2024, 10:30pm

You are right. thank you very much!

yanmin · December 27, 2024, 2:43am

because the results between the conjugate prior and non-informative prior are largely different ,I perform a prior predictive check for the above sample.
The R codes are as follows,

 # path <- 'https://pmc.ncbi.nlm.nih.gov/articles/PMC9042008/pdf/CJAS_49_1864814.pdf'
path <- 'D:\\bayes\\Bayesian hierarchical models for linear networks.pdf'
df <- tabulapdf::extract_tables(path,pages = 4)

accident <- df[[1]] %>% select(2:3) 
names(accident) <- c('length','accidents')
data_for_Stan <- list(N = length(accident$accidents),
                      length = accident$length, 
                      accidents = accident$accidents)

library(rstan)
# prior predictive checks
ppc <- '
  data {
    int<lower=0> N;
    vector[N] length;
  }

  generated quantities {
    real alpha = normal_rng(-6.65,0.09);
    real<lower=0> tau = sqrt(inv_gamma_rng(18.36, 58.06));
    vector[N] lambda;
    array[N] int<lower=0> y_sim;
    for (i in 1:N) lambda[i] = normal_rng(alpha,tau);
    y_sim = poisson_rng(exp(lambda) .* length);
  }

'
write(ppc,file = 'ppc.stan')
data_for_ppc <- list(N = nrow(accident),
                     length = accident$length)
fit.ppc <- stan(file = "ppc.stan", data = data_for_ppc,
                algorithm = "Fixed_param")

The results is not satisfying. First, the mean of y_sim is not integer. Second, many of the quantiles of y_sim are zero. Third, lp__ is zero or NA.

What’s wrong with my codes? Any help is appreciated.

mitzimorris · December 28, 2024, 3:10pm

the mean of a vector of integers is not necessarily an integer.

> x = c(1, 2, 3, 4, 5, 6)
> mean(x)
[1] 3.5

it’s possible that poisson_rng is blowing up.

R poisson_rng (reals lambda)
Generate a Poisson variate with rate lambda; may only be used in transformed data and generated quantities blocks. lambda must be less than 2^{30}.

if this happens in a generated quantities block, the error is not fatal and the sampler will continue to the next iteration - the value of y_sim is never changed and it looks like it was initialized to zero.

That’s because there’s no model block in this Stan program.

yanmin · December 29, 2024, 12:52am

Thank you for explaining!

Topic		Replies	Views
Modeling a hierarchical poisson model Modeling rstan , fitting-issues	18	229	December 13, 2024
Large discrepencies between stan_glm and standard glm Modeling specification	2	405	May 28, 2020
Help with specification of hierarchical priors? Modeling	9	1307	December 4, 2018
Two different models have the same log likllihood Modeling	2	117	June 18, 2024
Hierarchical Bayesian Poisson regression model Modeling fitting-issues	7	2383	August 17, 2019

Why different priors produce very large results

Related topics