Should I and can I specify multiple priors?


Hello all,
I’m trying to set up a Bayesian Logistic Regression of my thesis data, which uses three variables (see block below for more details). The code I’ve included here works perfectly and maps directly on to the results I got via your run-of-the-mill SPSS analysis. However, I have a theoretical question about model specification: should I be specifying different priors for each parameter in my model?

For example, the Guilt variable’s estimated Odds Ratio is 13.5, while Discount’s ratios range between 3.5 and 4.5. Incidentally, shouldn’t I also then be setting priors for each level of my parameters? Going back to the Discount variable, the number guilty plea acceptances (my DV) rose dramatically as I increased the Discount percentage, so I would expect very different results between levels in future studies.

Any advice on if this is necessary/warranted (and if so, how) would be greatly appreciated!

Potential Trial Sentence (PTS): a two level factor representing the two possible threatened trial sentences in my study (5 years | 25 years)

Discount: a three level factor that indicates the reduction from the initial threatened sentence above (20% | 50% | 70%) offered in the plea deal

Guilt status: two level factor of the defendant’s factual guilt (Innocent | Guilty)

model1=stan_glm(Accept_Reject~Discount+PTS+Guilt,
        family = binomial(link = "logit"), 
        data=pubdata, 
        #x = FALSE,
        #y = TRUE, 
        prior = normal(location = 1.1,scale = 2.5,autoscale = FALSE),
        #prior_intercept = normal(), 
        #prior_PD = FALSE, 
        algorithm = c("sampling"), 
        mean_PPD = TRUE,
        adapt_delta = 0.95, 
        #QR = FALSE, 
        #sparse = FALSE,
        chains=3,iter=50000,cores=3)

In general I set priors for each parameter that I know something about. The priors are just additional data you can code into your model. Also setting lower and upper bounds on your parameters can be helpful. Like in my recent model fish can’t have 0 or negative lengths :)

Great! Any chance you could direct me to some instructions, guides, or papers on how to do that?

So in rstanarm it’s like this

my_priors_riparian ← normal(location = c(1, 0, 0, -1, 1),
scale = c(.5, 1, 1, 0.5, 0.5), autoscale = FALSE)

for a model that is this

post4_1 ← stan_glmer(cw ~ mean_depth_gw_neg + annual_precipitation + std_depth_gw +
(1|site) + (1|year) + annual_tmax,
data = flood_model, adapt_delta = 0.99999,
chains = CHAINS, cores = CORES,
seed = SEED, iter = 4000,
prior = my_priors_riparian,
prior_covariance = decov(regularization = 1, concentration = 1, shape = 1, scale = 1))

But you have to know something about your data. In this model mean_depth_gw_neg has some slightly positive effect on cw. This is based on field observations and previous research. So I set the prior to be slightly positive.

Does that make sense?

1 Like

Yeah I’m with you–at least in terms of theory. I understand what you mean when you say you’re setting the prior to be positive and informed based on field observations…that is essentially what I’m trying to do as well. Its the coding part I’m not sure of.

So correct me if I’m wrong, but it looks like you’re defining a distribution in R first, and then telling rstanarm to use what you define in my_priors_riparian as your priors for the model?
But I see what looks like 5 means for the priors and 5 standard deviations/scales, but 6 variables specified in your stan_glmer? And do these priors correspond to just the factor itself, or are they attached to specific factor levels?

As an aside I’m trying to figure this out so that I can use the information from my current model here to inform the priors on my follow up replication and expansion study

Yes, I set the distribution of the priors first as a list and then feed that into the model. I think it’s in the rstanarm manual?

Six priors, oppps. Likely I forgot to remove the last prior. Sorry. That’s an old script we aren’t using anymore. It should be five priors!

The specific level interactions is the prior_covariance = decov(regularization = 1, concentration = 1, shape = 1, scale = 1))? Hopefully I have that part right.

Sorry it’s been a bit since I used rstanarm.

1 Like

That definitely helps, thanks! I at least have a place to start now. Will have to dig through the manual then to read up on this more, but now I have a general idea of what to do…Specify the distributions as a list and feed said list into stan_glm().

I don’t see prior_covariance listed as a function in stan_glm(), so I’ll have to dig around for that. Thanks for the help

Oh and a couple of points of best practices.

For rstanarm I am always explicit about the priors even if keeping them at the default normal(0,1).

Above the my_priors_riparian list I have a bunch of comments in the code that contain the reference for each prior. For the first prior of normal(1,0.5) there is a list of journal articles and field note books to support the prior.

If I don’t have a particular knowledge (or it’s lacking papers and field notes) about a parameter I will might set it to normal(0,1) and make a note of it. Better is to consult a pool of experts (3-5) and ask them to draw the distribution and then settle on a roughly midpoint distribution for the prior. I also document this process in the code. All this eventually makes it into the methods section and supplemental data.

1 Like

Just tried to use your method of specifying a list of priors and something in the code failed. Not sure if I wrote it wrong, or I’m missing something here:

testPriors <- normal(location = c(1.1, 2.5, -.5),
                 scale = c(1.5, 1, 1.5), autoscale = FALSE)

testmodel_multiPriors=stan_glm(Accept_Reject~Discount+Guilt+TP, 
                       family = binomial(link = "logit"), 
                       data=pubdata, 
                       prior = testPriors,
                       #prior_intercept = normal(), 
                       #prior_PD = FALSE, 
                       algorithm = c("sampling"), 
                       mean_PPD = TRUE,
                       adapt_delta = 0.95, 
                       #QR = FALSE, 
                       #sparse = FALSE,
                       chains=3,iter=50000,cores=3,
                       diagnostic_file=file.path(tempdir(), "df.csv"))

Error in new_CppObject_xp(fields$.module, fields$.pointer, ...) : 
  Exception: mismatch in dimension declared and found in context; processing stage=data initialization; variable name=prior_scale; position=0; dims declared=(4); dims found=(3)  (in '/data/hyperparameters.stan' at line 2; included from 'model_bernoulli' at line 54)

failed to create the sampler; sampling not done
Error in check_stanfit(stanfit) : 
  Invalid stanfit object produced please report bug

Hmm I’ve seen that error when the number of declared priors doesn’t match what’s in the model part. I’ll look at your code.

1 Like

So with simulated data this run fines. Maybe check your data frame?

set.seed(1941)
x1 = rnorm(100)
x2 = rnorm(100)
x3 = rnorm(100)
z = 1 + 2x1 + 3x2 + 3x3
pr = 1/(1+exp(-z))
y = rbinom(100,1,pr)
df = data.frame(y=y,x1=x1,x2=x2, x3=x3)
head(df)

testPriors ← normal(location = c(1.1, 2.5, -.5),
scale = c(1.5, 1, 1.5), autoscale = FALSE)

testmodel_multiPriors=stan_glm(y ~ x1 + x2 + x3,
family = binomial(link = “logit”),
data=df,
prior = testPriors,
#prior_intercept = normal(),
#prior_PD = FALSE,
algorithm = c(“sampling”),
mean_PPD = TRUE,
adapt_delta = 0.95,
#QR = FALSE,
#sparse = FALSE,
chains=4,iter=4000,cores=6,
diagnostic_file=file.path(tempdir(), “df.csv”))

Not sure how the data frame could be messed up, I’ve been using it for quite a while and I’ve never run into problems. But based on what you mentioned about the number of declared priors being off from the model, I added another 1 to each scale and location to see what would happen:

testPriors <- normal(location = c(1.1, 2.5, -0.5, 1),
                 scale = c(1.5, 1, 1.5, 1), autoscale = FALSE)

That totally fixed it for some reason. Yet I only have 3 IV’s declared in my model…any idea why it wants 4 priors? Is it looking for the DV too?

Never mind, I figured it out! Turns out that any factor that includes more than two levels needs multiple priors just on its own. So Discount needs two priors, not just one for both levels. I added the Sex (two level) and Race (four level) factors in for good measure and just replicated the problem…a model with Race wouldn’t run but Sex would.

1 Like

Ohh good catch @Longshot408! I should have asked if you had factors in there. My fault there.

1 Like