Sample_prior = "only" and size of dataset

Alasdair_Clarke · October 14, 2020, 9:19am

Hello,

I am running some prior predictions using sample_prior ‘only’.

Just now, everything is very slow, I suspect this is because I have over 100k datapoints. As I’m only sampling the prior at this stage, I should be able to save a lot of time by handing the model a much smaller dataset that only contains 1 observation per cell.

is this reasonable?

on a related note, can anybody recommend any resources on setting priors for random effect structures? Just now I am leaving all those to the defaults, but they appear to give me a really wild range of prior predictions.

torkar · October 14, 2020, 10:30am

Hi,

are you sure it’s because n=1e5? What are your priors? When sampling only priors then you don’t have to run that many iterations (and perhaps even use one chain only).

Alasdair_Clarke · October 14, 2020, 10:43am

Hello,

and thanks for your reply. I think it is due to the sample size, as I’ve reduced things to 1 row per cell (I’m modelling a cog psych experiment, so there are many repetitions of each experimental condition)and things run pretty quickly.

But I am a little puzzled as to why the size of the dataset should matter, so perhaps I am wrong on this!

Just now, my model is:

m ← brm(
rt ~ 0 + d_feature + log(Nd):d_feature + (log(Nd):d_feature|p_id),
data = df,
family = lognormal(),
prior = my_priors,
chains = 1,
sample_prior = ppc,
iter = n_itr)

With

my_priors ← c(
prior_string(“normal(-0.5, 0.1)”, class = “b”, coef = intercepts),
prior_string(“normal(0, 1)”, class = “b”, coef = slopes))

(d_feature has 6 levels, and I’m basically fitting y = bxNd + c for each level of d_feature.

This when I plot this, things look reasonable (first row… the other rows are exploring using a lognormal and shifted_lognormal).

I’m now fitting the model to the data… and it seems unreasonably slow. I suspect need to either simplify my random effect structure or, use better priors.

hhau · October 14, 2020, 11:34am

The time required to sample the prior will depend on the size of the data set (really, the ‘prior’ is conditional on the specific, observed values of the covariates). There are two contributors:

The amount of time it takes to compute X\beta / the linear predictor (this can be large even when doing simple Monte Carlo to sample the prior predictive).
Very broad / unevenly scaled priors that necessitate a large number of HMC leapfrog steps per iteration.

The line between prior and posterior gets very fuzzy here; It is useful to keep in mind that to Stan there is only a ‘log target density’, and you can fall into all of the usual pitfalls whether your target is a prior or a posterior.

I’m not sure this is true – the prior can be just as difficult to sample as the posterior, particularly for high dimensional parameters. You don’t know until you run your MCMC sampler (in cases where you can’t do simple Monte Carlo)

Some questions / code to run:

How many rows/column are left in your reduced matrix for your prior (and how many in your posterior? You can use prior_data <- make_standata(....) from BRMS to generate the Stan data, and call dim(prior_data$X) and dim(prior_data$Z_*) (I don’t remember the naming convention for Z off the top of my head)
How many leapfrog steps per iteration are being taken for the prior/posterior (you can check with rstan::get_num_leapfrog_per_iteration(brms_fit@fit)

torkar · October 14, 2020, 11:39am

you could do prior checks without sampling from the data. Simply randomly sample from each prior and bunch them together with you likelihood (perhaps using a link function) and then plot them?

Alasdair_Clarke · October 14, 2020, 11:44am

Are there any useful wrapper functions for this? I’ve done things this way before, but it sometimes feels like it requires quite a bit of long-winded code! [Not terribly difficult, but sometimes a little fiddly]

The nice thing about using sample_priors = “only” is that I can then plot my prior and posterior in the same way.

torkar · October 14, 2020, 11:52am

No, sorry to say I do it manually always :( And I agree, it’s always nicer to use the methods we already have to plot posteriors :)

hhau · October 14, 2020, 1:17pm

Unless your model explicitly includes measurement error, then it is rare to put a ‘prior’ on the design matrix. When doing linear regression: Y = X\beta + \varepsilon, we simulate from the prior predictive distribution for Y by simulating values of \beta and \varepsilon from their prior densities, then computing the appropriate value of Y. We (well, at least I do not) typically do not simulate a new design matrix X for each simulated value of (\beta, \varepsilon), mostly because there is no explicit generative model for X. I think ‘is X data or not’ is more of a philosophical question, but say the columns of X represent continuous covariates and I add a large number to every entry in X. If we keep the prior on \beta the same, then the prior predictive distribution for Y will change substantially.

Sometimes it is possible to code the generative model in Stan (using the generated quantities block and the relevant _rng functions) and call Stan with algorithm = 'fixed_param', which at least gets the samples into a known format (which can be passed to bayesplot, etc).

torkar · October 14, 2020, 1:29pm

Of course I meant when we don’t put them in X, i.e., a design matrix.

Topic		Replies	Views
Simulating a response using prior distributions brms	1	342	April 30, 2021
Use of `sample_prior='only'` in brms brms	12	5581	May 29, 2018
Brms prior more influential than data itself brms	3	744	October 10, 2019
Sampled Priors for Intercept don't match the setting brms prior-predictive	3	232	February 28, 2024
Generating samples from priors for Bayes Factors when sample_prior="no" brms bayes-factor	4	565	July 10, 2019

Sample_prior = "only" and size of dataset

Related topics