Getting Estimated Posteriors Means for Parameters from a Model (posterior passing)

Hello,
Given this model output how would I go about extracting the posterior means of the parameters so that I can insert them as starting estimates for the prior parameters ?

I would like to use the same model on some new data (which I believe will share same distributional properties as the original data) and would like to use these estimates are starting values for priors. I do realize that the posterior distribution of the parameters might appear very different from what they started out .

The model is below. Though I have some idea as how to achieve this for the prior for the os fixed effect, but unsure for the rest :(

 Family: gaussian
  Links: mu = identity; sigma = log
Formula: log(ccr + 1) ~ os + s(nvc, m = 1) + (1 + os | c_version)
         sigma ~ os
   Data: D (Number of observations: 594)
Samples: 4 chains, each with iter = 4000; warmup = 2000; thin = 1;
         total post-warmup samples = 8000

Smooth Terms:
            Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sds(snvc_1)     0.39      0.15     0.19     0.76 1.00     2744     3874

Group-Level Effects:
~c_version (Number of levels: 8)
                            Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)                   0.08      0.06     0.00     0.24 1.00     2693     3256
sd(osLinux)                     0.24      0.16     0.02     0.63 1.00     1885     1963
sd(osWindows_NT)                0.15      0.08     0.04     0.34 1.00     1918     2025
cor(Intercept,osLinux)         -0.00      0.48    -0.85     0.88 1.00     2901     3842
cor(Intercept,osWindows_NT)    -0.39      0.46    -0.96     0.67 1.00     2202     3385
cor(osLinux,osWindows_NT)      -0.18      0.44    -0.89     0.70 1.00     3452     4786

Population-Level Effects:
                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept              0.61      0.06     0.49     0.73 1.00     5523     4149
sigma_Intercept       -0.66      0.05    -0.76    -0.55 1.00     8120     5667
osLinux                0.25      0.13    -0.02     0.52 1.00     3617     3178
osWindows_NT          -0.18      0.08    -0.35    -0.02 1.00     5232     4787
sigma_osLinux          0.37      0.07     0.23     0.51 1.00     8210     6322
sigma_osWindows_NT    -0.82      0.07    -0.96    -0.67 1.00     7891     6406

Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample
is a crude measure of effective sample size, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

(I believe the idea was covered here: https://osf.io/jrjd2/ )

I’d suppose what you’d want is samples from the posterior predictive distribution (i.e. p(x*\mid x)) for your parameters of interest. In pure Stan you’d do this in the “generated functions” block by using the “_rng” functions to make random draws from your distributions of choice using your posterior parameters.

I don’t use BRMS but looking at the documentation there is a overload for the predict function in R which you can call on your brmsfit object to generate predictive samples. See here:

brms deserves a book :). The discourse is well documented and I should have read better. Anyways brms by default provides some priors which you can’t sample from and are enough to get started (but can horribly wrong if the priors do not work for your data). So you need to first specify your priors and from there you can get coefficients. For example, I simplified the above model

 brm(formula=bf(  log(cmr+1)  ~  0+ Intercept + os + nvc + (1+os|c_version)),
            prior=c(
                prior(normal(  0.96,0.3)    ,class = b, coef = Intercept),
                prior(normal( -0.13,0.15)  ,coef='osLinux',class = "b"),
                prior(normal(  0.03,0.25)  ,coef='osWindows_NT',class = "b"),
                prior(normal( -0.44,0.1)    ,coef='nvc',class = "b"),
                prior(normal( 0.56,0.3)     ,coef='Intercept',group='c_version',class = "sd"), ## these came from actually running the model
                prior(normal( 0.24,0.3)  , coef='osLinux',group='c_version',class = "sd"),
                prior(normal( 0.37,0.3)  , coef='osWindows_NT',group='c_version',class = "sd"),
                prior(lkj(1.1) ,group='c_version',class = "cor")
            ),
            data = d2,cores=2,chains=2)

Without specifying the priors, the model had very poor estimation metrics. I got the estimates from a dataset d1 which is similar to d2 . With the priors, it’s much better.

To be honest, I find it puzzling how to choose a prior value for group effects. As I mentioned above, I got those estimates by running on a larger data set d1 but the brms call didn’t have prior information for the c_version group.

And then to get the estimates something like this would work

c( mean(posterior_samples(cmr2a,par='sd_c_version__Intercept')[,1]),
  sd(posterior_samples(cmr2a,par='sd_c_version__Intercept')[,1]))

I imagine for sd a prior of normal(0.24,3) represents half normal with mean 0.24 and sd 3. @paul.buerkner is this understanding correct?

Thanks much!
saptarshi

Ideally your model shouldn’t be extremely sensitive to your prior since then your hyper parameters are deciding your outcomes.

Just a suggestion, but it may be worth considering coding your model directly in Stan if you’d like relatively fine grained control and lots more transparency regarding what’s going on.

This is fair point but the choices for the prior parameters were taken from a different data set that is assumed to be similar to this. The sigma of the prior could definitely be made wider. That said, it really does help the model fit faster if one has an idea of roughly where the parameters lie.

Using Stan directly can help with some steps (like post-processing when your distribution is not fully supported.

However posterior_summary will give you estimates with probabilities for all the parameters in the model.

Ideally your priors would be regularizing. That is, they would prevent Stan from exploring areas that contain parameter variables that are completely improbable, but still allow values that may be unlikely but aren’t impossible. The easy example is not allowing the intercept of adult heights to explore areas close to 0.10 m or 3.5 m.

The priors on group level terms (at least for the intercepts) are there to keep the sampler away from areas that are again, improbably. In this case, estimates for how far away from the global mean each group would be. In our example if you were using sex as a grouping factor, you would want your group intercepts to allow differences from the mean to be reasonably large (probably not bigger than the variability in the population as a whole) but also as small as they want to be (ie, possibly 0).

Not sure what problems you were having with the qualities of the estimates that you suggested in the first part of your questions, but there are a variety of ways to address them. Making your priors very narrow may or may not help.

Breaking vectorization on the priors will also slow down brms quite a bit.