Using posteriors as new priors

For reference, I am using rstanarm to analyse my data using a Bayesian framework. I am working with the stan_glm function. The package version I am using is 2.19.2, and my R version is 3.6.2.

Below are examples of 2 dataframes I am using:

dat <- structure(list(heading = c(2, 0.5, 2, 1.5, 2, 1.5, 2, 1.5, 2, 0.5, 2, 2, 2, 2, 2, 1.5, 1.5, 
1.5, 2, 0, 2, 1, 1.5, 2, 1, 0.5, 1, 0.5, 1.5, 0.5), FirstSteeringTime = c(0.433389999999999,  
0.449999999999989, 0.383199999999988, 0.499899999999997, 0.566800000000001, 
0.58329999999998, 0.5, 0.449799999999982, 0.566600000000022, 0.466700000000003, 
0.433499999999981, 0.466799999999978, 0.549900000000036, 0.483499999999992, 
0.533399999999972, 0.433400000000006, 0.533200000000022, 0.450799999999999, 
0.45022, 0.48342000000001, 0.46651, 0.68336, 0.483400000000003, 0.5167, 
0.383519999999997, 0.583200000000005, 0.449999999999989, 0.58329999999998, 
0.4999, 0.5334)), row.names = c(NA, 30L), class = "data.frame")

I have 2 experiments that I aim to analyse. The second is a small iteration of the first, and thus I aim to use the Bayesian analysis of the first to inform my priors of the second. In my first experiment, I use default priors to analyse my data:

    fit_1 <- stan_glm(FirstSteeringTime ~ heading, family = Gamma(link = "identity"), prior = 
    normal(0, 0.3), adapt_delta = 0.999, data = dat)

Using this model I can compute Bayes factors, investigate the posterior distribution by implementing ROPEs etc (I do this using the bayestestR package). However I am wondering how I can use the posteriors of the fit_1 model to inform priors of my second model.

On suggestion I have had is to sample the posterior mean and standard deviation and use these as location and scale parameters for the prior distribution of my second analysis.

    library(bayestestR)
    posteriors <- insight::get_parameters(fit_1)
    describe_posterior(posteriors, centrality = "all", dispersion = TRUE, test = "all")

However, how can I specify the overall shape of the distribution so that it matches that of my posteriors from the first analysis?

Any help is most appreciated, thank you!

1 Like

Your options are:

  1. As you say, look at the marginal posteriors from the first model and try to determine some summary statistics (ex. mean & SD) for use in the second model’s prior. If you think a normal distribution doesn’t approximate the shape of the first model’s posterior, there are lots of alternative prior shapes available in Stan. You could try to work out a multivariate prior for multiple parameters at once, but here you become more restricted in available shapes (I think there’s only multivariate normal and multivariate student-t).

  2. Code a single model for both experiments simultaneously with either full or partial pooling for your parameters of interest. I think this is the preferred approach, and certainly seems more straightforward/elegant than figuring out how to approximate the first model’s posterior.

2 Likes

Hi @mike-lawrence thank you for your reply! I’ll have to read up on your second suggestion as I have not come across it before. Do you have an links to resources that might be useful?

Here’s a thread discussing this topic: Composing Stan models (posterior as next prior)

There’s a variety of posterior-to-prior approximation methods discussed there but note Michael Betancourt’s warnings on losing information when you try to run things as separate models and advice to simply re-fit the model when feasible. In the scenario of a second experiment, you’d want to take your single-experiment model and for every parameter that you previously had as a single value, model it instead as having a distribution across studies with a mean and variability. So, if the experiments were in my field, cognitive science, I’d usually have multiple human participants in each experiment so for a given experiment’s model I’d have a parameter for the mean-across-participants in the outcome, and to make a single model of both experiments I’d code this mean-across-participants parameter as itself deriving from a meta-distribution with a meta-mean and meta-variability. With only two experiments, you’ll want to think carefully about the prior you put on the meta-variability parameter, as this will effectively reflect how much inference from one experiment will influence inference in the other experiment and vice versa. I haven’t seen any explicit case studies of this scenario, but possibly @betanalpha could chime in on whether my suggestion makes sense.

2 Likes

Thank you your reply! I’ll have a read of that thread to try and get my head round it. Thank you!

That’s about right, but there’s a subtle point that I want to emphasize.

Given an observational model for both experiments, including which parameters are shared between the two experiments, \pi_{1} (y_{1} \mid \theta_{1}, \phi) and \pi_{2}(y_{2} \mid \theta_{2}, \phi), then the options are straightforward. In theory we could specify our posteriors sequentially, first combining the initial observational model and prior,

pi(\theta_{1}, \phi \mid y_{1}) \propto \pi(y_{1}, \mid \theta_{1}, \phi) \, \pi(\theta_{1}, \phi)

and then using that first posterior as prior to complement the second observational model (along with a new prior for the parameters specific to the second experiment),

\pi(\theta_{1}, \theta_{2}, \phi \mid y_{1}, y_{2}) \propto \pi(y_{2}, \mid \theta_{2}, \phi) \, \pi(\theta_{1}, \phi \mid y_{1}) \, \pi(\theta_{2} \mid \theta_{1}, \phi)

Or you could fit everything jointly. Using the first data you’d get

pi(\theta_{1}, \phi \mid y_{1}) \propto \pi(y_{1}, \mid \theta_{1}, \phi) \, \pi(\theta_{1}, \phi)

again but when incorporating the second you’d fit the first data set again,

\pi(\theta_{1}, \theta_{2}, \phi \mid y_{1}, y_{2}) \propto \pi(y_{2}, \mid \theta_{2}, \phi) \, \pi(y{1} \mid \theta_{1}, \phi) \, \pi(\theta_{1}, \theta_{2}, \phi).

At the cost of some redundancy you get to use computational tools that fit one model at at time, like Markov chain Monte Carlo.

Importantly, the decision to fit things with incremental posterior-to-prior sequences or with joint posteriors over all of the available data is independent of how the two experiments are modeled. In the example introduced by @mike-lawrence one has to figure out how to pool parameters between the two experiments regardless of how the incremental data are fit. While this modeling is really important, it is independent of the original question.

Ultimately I recommend fitting all of the data if possible. Streaming is best used for industrial applications where data are received in batches and inferences have to be computed on the fly and quickly to monitor the system being analyzed. In these cases priors approximately informed by previous posteriors are usually good enough.

1 Like