Questions about Bayesian replication test

I have a few probably naive questions about Bayesian replication test over a GLMM (in my case, with Bernoulli response distribution). Verhagen and Wagenmarkers (2014), propose a test to ask whether the responses in replication study are more likely to have been drawn from the same distribution as the responses in the original study, or from the null. In their paper, they describe this for a one-sample t-test (p. 1461):


where t_df is the ordinate of the central t-distribution with df degrees of freedom, and t_df,deltaN^.5 is the ordinate of the noncentral t-distribution with df degrees of freedom and noncentrality parameter Delta = deltaN^.5. The index i indicates the i-th sample out of M samples drawn from the posterior distribution of effect sizes delta after the original experiment has been observed. t_rep is the observed t-value in the replication attempt.

What I’m trying to do
My most general question is how to best generalize this test to regression models, for which I want to know whether a specific effect replicates. For linear mixed models fit with e.g., lme4, it seems like one could just take the t-value for the coefficient test with N-2 dfs (or more conservatively, with the dfs estimated by e.g., Kenward-Roger’s approximation). But I’m curious how to more directly use the samples from a brmsfit (also because that would free me from assumptions about the relevant statistics and DFs for the coefficient tests). So I have questions about how best estimate the numerator and the denominator of the first line in equation (6) for a brms model.

A) To get the numerator: My first thought was to get the value for p(Y_rep | H_r) by taking the model fit to the original data and then calculate the likelihood of the replication data under all samples and average over that. But that would evaluate whether everything replicates. However, I’m interested in whether a specific effect (i.e., one specific parameter value) replicates. So, I intend to take the brmsfit fit to the replication data, but replace the estimates for the critical effect by samples from the brmsfit fit to the original data. E.g,. I could sample rows of parameter values from the posterior samples of the brmsfit fit to the replication data, but replace the value for the parameter of interest with random draws from posterior samples for that parameter from the brmsfit fit to the original data.

(if I have random slopes by any grouping factor for the parameter of interest, I would presumably also have to make a decision about how I would handle that; the problem I am currently dealing with does not require such random slopes).

B) To get the denominator: I’m thinking of using the same procedure as described in A) but then to subtract its mean estimate from each draw for the parameter of interest from the brmsfit fit to the original data, in order to get an idea of how likely the data is to come from a null effect under the assumption that the variability around that null would be identical to the variability observed around the original effect in the original experiment.

Now, finally, my questions

  1. Does that sound adequate? Or am I completely off?
  2. If yes, should I sample any row from the replication fit and any parameter value of the original fit proportional to their respective data likelihoods (or is that likelihood already captured by the distribution of posterior samples itself)?
  3. Is there a convenient way to replace a matrix of posterior samples from a brmsfit with such a new manipulated matrix, so that I could—for convenience—keep using the predict or some other function to estimate the posterior probability of the data under the revised posterior samples?

Any pointers (incl. “argh, this is so stupid that I don’t even know where to start” ;)) would be welcome. Thanks for reading this long post.

Verhagen and Wagenmarkers (2014). Bayesian Tests to Quantify the Result of a Replication Attempt. JEP:G 143(4), 1457-75

If interested in a specific effect, here’s one idea using BFs and bridgesampling:

  1. Use the previous data to establish a prior. You may want to construct a normal prior centered at their estimate, with a standard deviation equal to their SE estimate. Do this for every coefficient.
  2. Construct two models, WITHOUT using the ~ syntax. You need to use the target += normal_lpdf(…) syntax. (This is because we will need to compute the marginal likelihood, with all normalizing constants present, and the ~ syntax drops those constants). The first model has the full model, with priors defined in step 1. The second model has the reduced model, with the coefficient of interest implicitly set to zero (as in, it isn’t present). Everything else is the same as in the first model.
  3. Estimate both models using the replication data.
  4. Use the bridgesampling package to estimate the (log) marginal likelihoods.
  5. BF01 = exp(log.ML.reduced - log.ML.full)

That will give you a Bayes factor, similar to what that paper talks about.

Hi Stephen,

thank you for the answer.

Regarding point 1 first thought the same as you (run model on first data, then derive parametric prior from that; use that prior for same model on second data), but can’t I just throw the joined data at the sampler? That won’t introduce any additional parametric assumptions, but will also update the original prior from the first data to the posterior after both the first and second data have seen, won’t it? (or is there a particular reason you prefer the intermediate step of deriving a parametric prior based on the first data?)

Regarding point 2, are you talking about Stan, not brm? E.g., write stan model from brm and modify it

For 4 & 5, I assume I’d use

# using an order of magnitude more samples than I'd do to estimate parameters in the model
# to get robust estimation of marginal likelihood
b1 = bridge_sampler(samples = model1, cores = 4, method = "warp3", iter = 10000)
b2 = bridge_sampler(samples = model2, cores = 4, method = "warp3", iter = 10000)
bf(b1, b2)

Does that look right?


For /estimating/, you could throw both datasets into the same model, and yes it would be the same as doing a bayesian update.

However, for /testing/ using the BF, you’d want the prior to encode your uncertain predictions on parameters. The priors serve different purpose when using BFs vs when estimating. These predictions for replications are derived from the originating studies’ posteriors. So you want p(theta | y_1) to be your prior. When you estimate the marginal lilkelihood, you would then obtain the integral of p(y_2 | theta, M_1, y_1)p(theta |M_1, y_1) = p(y_2 | M_1, y_1) (and likewise, for M_2/model 2).

And yes - I was discussing Stan (didn’t see the tag, sorry). Brms by default uses the target += specification, not ~, for this purpose. You should be able to feed any brms model into bridge_sampler and get the marg. likelihood.

1 Like

Hi Stephen,

I thought some more about this. Your third point suggested to “estimate” the two models with the new prior using the replication data. I hope you don’t mind me following up, as I realize I still have some confusion about this step. From equation six above, it would seem that we want to just get the likelihood of the replication data for the model(s) under the new prior (rather than to update that prior with the data of the replication experiment). At the same time, I get how intuitively, the success of replication is captured by the relative increase/decrease of the probability of a null effect before and after seeing the replication data, i.e., the ratio between p(theta = 0 | Y_rep) compared to p(theta = 0 | Y_orig).

But, if I’m understanding your step 3 correctly, would the BF you’re proposing compare the posterior marginal probability of the replication data against the posterior marginal probability of the replication data if the effect in question is 0? That seems different to me: it seems like we’d be assessing whether there is a non-zero effect after combining both data (that seems more similar to the Rouder and Morey, 2012 proposal). I think I’m missing something (or did I misunderstand your step 3?).

You are correct - You want the likelihood of the replication data under the prior that is informed from the original study. To do so, you ‘estimate’ the model in Stan using the informed priors and the replication data. Then when you feed it to bridgesampling, it will work out what the marginal likelihood is. You don’t technically need to even look at the updated posterior - Stan is just used as the vehicle to obtain samples, from which the bridgesampling package will figure out the marginal likelihood.

So whereas Stan will give you p(theta|y_1, y_2, Model_1) from p(y_2|theta)p(theta|y_1, Model_1); bridge sampling will estimate p(y_2|y_1, Model_1), which is the quantity of interest.
You do that for both model 1 and model 2 (one being full, one having the parameter of interest set to zero - meaning it’s not in the model). You then will have p(y_2|y_1, Model_1) and p(y_2 | y_1, Model_2); the ratio of those is the Bayes factor.

Thank you! That’s very helpful. I’ll have to read more about bridgesampling (which I’ve now been doing). While I do that, I have a few more follow up questions, if you or anyone else has time:

  1. For the fixed effects, I’ve set the prior for the replication data to be a normal with the mean of the mean of the posterior distribution of the same parameter from the original data, and the standard deviation set to the standard deviation. Does that sound ok?

  2. Since I am working with a mixed model (with a bernoulli response distribution), I also need to set priors for the random effect variances and correlations. Do you have any recommendations? An inverse-gamma for the variances?

  3. When I’m constructing the null model (setting one parameter of interest to 0 by removing it from the model), I am leaving the corresponding random slope for the effect by any grouping variable (in my case, the items in a repeated measures design) in the null model and only remove the fixed effect parameter.

Any thoughts on this would be appreciated.