Priors Not Having an Effect on Posterior Samples

Hi Stan community,

I’m new using this program. I’m having trouble getting my priors to influence the posterior samples in my Stan model. I’ve tried tightening the priors and reducing the number of samples to minimize the effect of the data, but the posterior samples don’t seem to reflect the prior specifications. Here’s my Stan program and a description of what I’ve tried:

data {
  int n_timesteps;
  int n_channels;
  int n_channels_comp;
  int n_core_neg;
  int n_core_pos;
  matrix[n_timesteps, n_channels] spend;
  matrix[n_timesteps, n_channels_comp] spendcomp;
  matrix[n_timesteps, n_core_neg] core_neg;
  matrix[n_timesteps, n_core_pos] core_pos;
  vector[n_timesteps] depvar;
}

parameters {
  real<lower=0> sigma;
  real intercept;
  vector<lower=0>[n_channels] betas;
  vector<upper=0>[n_channels_comp] beta_comp;
  vector<upper=0>[n_core_neg] beta_neg;
  vector<lower=0>[n_core_pos] beta_pos;
}

model {
  // Priors
  betas ~ normal(1.5, 0.2);  // Prior for betas
  sigma ~ normal(0, 1);      // Prior for sigma

  // Likelihood
  depvar ~ normal(
    intercept + spend * betas + spendcomp * beta_comp + core_neg * beta_neg + core_pos * beta_pos,
    sigma
  );
}

generated quantities {
  vector[n_timesteps] log_lik;

  for (n in 1:n_timesteps) {
    log_lik[n] = normal_lpdf(
      depvar[n] | intercept + spend[n] * betas + spendcomp[n] * beta_comp + core_neg[n] * beta_neg + core_pos[n] * beta_pos,
      sigma
    );
  }
}

What I’ve Tried

  1. Tightened the Prior:
    I set a strong prior on betas (normal(1.5, 0.2)), but the posterior samples for betas don’t seem to be influenced by this prior.

  2. Reduced the Number of Samples:
    To minimize the effect of the data, I reduced the number of samples to 1 warmup and 1 sampling iteration per chain. However, the samples still don’t reflect the prior.

Reproducible Example
Here’s an example of how I’m fitting the model in R:

library(rstan)
base_data<-read.csv(base_data.csv)
h <- 20  # Number of future steps to predict (can be adjusted)
min_train_size <- 80  # Minimum training window
max_train_size <- nrow(base_data) - h  # Ensure there's test data left

train_data <- base_data[1:train_size, ]
  test_data  <- base_data[(train_size + 1):(train_size + h), ]
  
  # Prepare data for Stan
  stan_data <- list(
    n_timesteps = nrow(train_data),
    n_channels = ncol(as.matrix(train_data[, colnames(train_data) %in% c("tv")])),
    n_channels_comp = ncol(as.matrix(train_data[, colnames(train_data) %in% c("comp_tv")])),
    n_core_neg = ncol(as.matrix(train_data[, colnames(train_data) %in% c("price","comp_promo_1","comp_dist_2")])),
    n_core_pos = ncol(as.matrix(train_data[, colnames(train_data) %in% c("dist", "promo","comp_price_1", "comp_price_2")])),
    spend = as.matrix(log(train_data[, colnames(train_data) %in% c("tv")] + 1)),
    spendcomp = as.matrix(log(train_data[, colnames(train_data) %in% c("comp_tv")] + 1)),
    core_neg = as.matrix(log(train_data[, colnames(train_data) %in% c("price","comp_promo_1","comp_dist_2")] + 1)),
    core_pos = as.matrix(log(train_data[, colnames(train_data) %in% c("dist", "promo","comp_price_1", "comp_price_2")] + 1)),
    depvar = log(train_data$sales + 1)
  )
  
  # Run Bayesian Model on Training Data
  fit_train <- stan(
    model_code = model,
    data = stan_data,
    iter = 5000,
    chains = 4,
    seed = 123
  )
  print(fit)

Questions

  1. Why aren’t the priors having an effect on the posterior samples?
  2. Is there something wrong with my Stan program or the way I’m fitting the model?
  3. How can I ensure that the priors influence the posterior as expected?

Additional Context

base_data.csv (23.1 KB)

  • The data is in logarithm. I set a big prior for “betas” just to exaggerate the effect and check if the prior was read.
  • I’ve checked the convergence diagnostics (Rhat and n_eff), and they look good.

Just to debug could you set the prior to something strong and unreasonable, as then it should make a difference?

Restricting warmup to only 1 sample eliminates the effects of both data and prior. So don’t change the number of output samples. To weaken the influence of data relative to prior, you should decrease the amount of input data, e.g. use a smaller n_timesteps.

I notice you have implicit flat priors on these parameters:

real intercept; vector<upper=0>[n_channels_comp] beta_comp; vector<upper=0>[n_core_neg] beta_neg; vector<lower=0>[n_core_pos] beta_pos;

Is that by design? I haven’t looked very closely at your model, but if are you expecting this prior

betas ~ normal(1.5, 0.2);

to apply to all the parameters named “beta_”, that’s not how Stan works. It only applies to these parameters in your code:

vector<lower=0>[n_channels] betas;

Could that be the issue here?

Hi Andrew! Thank you very much for helping me with this issue! Today I tried again and variables now changed, I think something wrong was happening with my R session or I did a mistake. This issue can be considered closed.

Hi erognli, yes I noticed that later, and I changed it, but it was not the issue. I applied priors to this exact model and for all of the betas were closer to the prior, I think there was an issue with my R session.