Setting informative priors from pilot study for lognormal model

Hi all,

I’m new to Bayesian statistics using brms and stan, hence apologies if my question seems very basic.

I’m trying to fit a regression model in brms with informative priors. The prior knowledge is based on a previously conducted pilot study, for which I produced posterior distributions of the intercept, group level effects and population level effects. I am then using this information of the posteriors from my pilot study as priors for my actual study (including of course only the recent data collected).

My pilot model was:
fitdur <- brm(formula = Duration ~ DSI + rankdiff + agediff + sexcode +(1|Initiator), data=entryduration, family=lognormal(), warmup= 1000, iter = 2000, chains = 4, control = list(adapt_delta = 0.99, stepsize=0.01, max_treedepth=15))

I am, however, struggling to understand how to set the right distribution of priors for my final updated model; I see that, by using prior_summary, the default priors of my pilot model were all student-t-priors. But since I used a model with the family “lognormal”, I wonder whether I should change the default priors of my final, updated model (including the new data collected and the prior of the pilot study) to lognormal priors for all parameters (i.e., intercept, b’s, sd, sigma)? Or, should I leave the default prior distributions as they are and just insert the posterior means and sd of my pilot model? If so, should I transform these values (means and sd), considering they stem from a lognormal model?
As a more general question - should you inform your model by including the distribution of your previous posteriors as your new prior distributions?

Any advice would be of great help, thank you!

Best, Raphaela

There is no reason to set priors to lognormal just because your likelihood is lognormal.

If you analyse your final data set including both data from pilot and main study, I wouldn’t let the priors be informed too much by the data of the pilot study, because then you end up including this data twice into your model (one as data and one as prior).

If you have the raw data of previous studies available, I would usually prefer putting all data into the same model (with extended structure) instead of trying to use the posterior of one study as prior of the other. The latter is really hard to do correctly.

1 Like

Thanks for the tip!

In this case, I was not planning to include both the data from pilot and main study in my final model; my plan was to include only the raw data from the main study and the posteriors of the pilot study as priors.

What do you mean when you say extended structure? Are there some examples you could point to?
Also I was planning on using the posterior for ambient temperature as the prior for a another variable that I created that is based on how the temperature moves through a wall. Do you feel this is problematic?

With extended structure I mean modeling that the data came from two different studies (e.g., via an additional factor in the model) unless the studies are exact replications with samples from the same populeation.

If your idea about such a prior is problematic or not depends on may details that I can’t judge based on the input you have given.

One of the issues I have been having with using the posterior of one analysis with the prior of another is the inability to create a function or loop. I can pull the mean and standard deviation from the posterior but I have to manually insert those values into the next model because the model won’t accept variable names just actual values. Example below:

M2_ins ← brm(use2 ~ Temp + cooling + (1|Month.f) + (1|Time.f),

  •           data = ins.mod2, family = lognormal(), chains = N_chains, 
    
  •           inits = init_values, control = list(adapt_delta = .95, max_treedepth = 15),
    
  •           prior = set_prior('lognormal(m2, s2)'), cores = getOption("mc.cores", 4L), sample_prior = TRUE)
    

SYNTAX ERROR, MESSAGE(S) FROM PARSER:

variable “m2” does not exist.
error in ‘model16541a475442_file16545ba67102’ at line 51, column 34

49:   } 
50:   // priors including all constants 
51:   target += lognormal_lpdf(b | m2, s2); 
                                     ^
52:   target += student_t_lpdf(temp_Intercept | 3, -3, 10); 

Error in stanc(model_code = paste(program, collapse = “\n”), model_name = model_cppname, :
failed to parse Stan model ‘file16545ba67102’ due to the above error.

You either need to put “m2” into the stan model via stanvar or just write out the prior values in the form of numbers.

I am having trouble getting stanvar to work. Tried a few different ways, e.g.,
stanvars <- stanvar(x = “m2”, scode = “m2”, block = “parameters”) + stanvar(x = “s2”, scode = “s2”, block = “parameters”)
priors <- c(set_prior(“normal(m2, s2”, class = “b”, coef = “Temp”))
M2_ins <- brm(use2 ~ Temp + cooling + (1|Month.f) + (1|Time.f),

  •           data = ins.mod2, family = lognormal(), chains = N_chains, 
    
  •           inits = init_values, control = list(adapt_delta = .95, max_treedepth = 15),
    
  •           prior = priors, cores = getOption("mc.cores", 4L), 
    
  •           sample_prior = TRUE, stanvars = stanvars)
    

SYNTAX ERROR, MESSAGE(S) FROM PARSER:

error in ‘model16541b1e3c79_file16543a467919’ at line 38, column 1

36:   vector<lower=0>[M_2] sd_2;  // group-level standard deviations
37:   vector[N_2] z_2[M_2];  // unscaled group-level effects
38: m2
    ^
39: s2

PARSER EXPECTED: <one of the following:
a variable declaration, beginning with type,
(int, real, vector, row_vector, matrix, unit_vector,
simplex, ordered, positive_ordered,
corr_matrix, cov_matrix,
cholesky_corr, cholesky_cov
or ‘}’ to close variable declarations>
Error in stanc(model_code = paste(program, collapse = “\n”), model_name = model_cppname, :
failed to parse Stan model ‘file16543a467919’ due to the above error.

How is Stan supposed to know the value of m2 if you are not passing it…?

Please pass the numeric value for “m2” to the x argument of stanvar.

I have 3 models that I am stringing together to analyze. If I have to write the value in each time then I can not make it a loop. I thought by using stanvar It would understand m2 = the numerical value from the posterior mean from the previous model.

m2 <- 1
stanvar(x = m2, ...)

There is a difference between m2 and "m2". The first in an R object (that’s what you mean) the second is just a string (that’s what you wrote).

I see what you mean. Seems to still not accept it when I fix it though.
> stanvars <- stanvar(scode = m2, block = “parameters”) + stanvar(scode = s2, block = “parameters”)
> priors <- c(set_prior(“normal(m2, s2”, class = “b”, coef = “Temp”))
> M2_ins <- brm(use2 ~ Temp + (1|Month.f) + (1|Time.f),
+ data = ins.mod2, family = lognormal(), chains = N_chains,
+ inits = init_values, control = list(adapt_delta = .95, max_treedepth = 15),
+ prior = priors, cores = getOption(“mc.cores”, 4L),
+ sample_prior = TRUE, stanvars = stanvars)
SYNTAX ERROR, MESSAGE(S) FROM PARSER:

  error in 'model165444282486_file16541b581044' at line 38, column 1
  -------------------------------------------------
    36:   vector<lower=0>[M_2] sd_2;  // group-level standard deviations
    37:   vector[N_2] z_2[M_2];  // unscaled group-level effects
    38: 1.0312683587556
        ^
    39: 0.0137644499323559
  -------------------------------------------------

PARSER EXPECTED: <one of the following:
  a variable declaration, beginning with type,
      (int, real, vector, row_vector, matrix, unit_vector,
       simplex, ordered, positive_ordered,
       corr_matrix, cov_matrix,
       cholesky_corr, cholesky_cov
  or '}' to close variable declarations>
Error in stanc(model_code = paste(program, collapse = "\n"), model_name = model_cppname,  : 
  failed to parse Stan model 'file16541b581044' due to the above error.

Look closely at the “block” argument again.

Thank you, took me a while to get it but finally ran. The “real<lower=0>” is that basically saying it is a real value with zero as the lower bound?:
stanvars <- stanvar(x = m2, name = "m2", scode = "real<lower=0> m2;", block = "parameters") + stanvar(x = s2, name = "s2", scode = "real<lower=0> s2;" , block = "parameters")
priors <- c(set_prior("normal(m2, s2)", class = "b", coef = "Temp"))

M2_ins <- brm(use2 ~ Temp + (1|Month.f) + (1|Time.f), 
              data = ins.mod2, family = lognormal(), chains = N_chains, 
              inits = init_values, control = list(adapt_delta = .95, max_treedepth = 15),
              prior = priors, cores = getOption("mc.cores", 4L), 
              sample_prior = TRUE, stanvars = stanvars)
1 Like

I thought you wanted to pass “m2” as data not as parameters, given that you are apparently looping over values of “m2”? That is the reason why I was surprised by your choice of the block argument.

Your absolutely right. I misunderstood what the block argument was for. I wanted the m2 and s2 values to be used as priors and thought that meant I should have the block argument as parameter. Once I change the block argument to data the results are similar to manually plugging in the values.