Instead of generating a posterior sample of a parameter, I’d like to use some prior information about the parameter and ‘focus’ my fitting on the remaining ones. I know I can specify a constant
scalar as a prior in brms
, but is ther a way to specify a ‘constant’ distribution, i.e. prior(constant(normal(15, 2)), lb = 2, ub = 20, nlpar = "theta")
?
You can refit the model many times for different values of the parameter of interest, drawn from the distribution of interest, and combine the resulting posterior chains for inference.
Alternatively, you can fit the model jointly and if the posterior for the parameter of interest is sufficiently similar to the prior, you can try to use some kind of weighting scheme to re-weight the posterior until the margin for this parameter is similar to the desired margin.
But I think this sort of approach begs the question of “why bother”? If you fit the model jointly and you get a posterior margin for this parameter that’s substantially different from your desired normal(15, 2)
then why wouldn’t you prefer to use that data-informed margin as opposed to some weird re-weighted version of the posterior?
Thanks for the input. To be honest, those are all pretty complex approaches. I think I’ll try creating my own custom_family
.
As to why bother, well I believe that the weak priors on other parameters, combined with noisy data leads to unrealistic estimates of this parameter and, in turn, the rest of the model. Perhaps I’m making a mistake, but I’m interested in seeing if doing something more flexible than fixing the value to a constant also results in reasonable model fits.
I agree those approaches aren’t straightforward, but you cannot get the outcome that you describe via a custom family without also doing something like one of these suggestions.
If that’s true, then it implies that the joint model is misspecified.
I don’t disagree with your point about misspecification, but figuring out how it is misspecified is part of my motivation for fixing some parameters to a certain degree.
Regarding using a custom_family
, I was imagining one could incorporate the weighting (prior) into the definition of the model so that their effects are accounted for in the likelihood function.
It seems like an error in variables approach with me
or mi
would also be a possibility, i.e. specify these terms as explanatory variables with measurement noise equivalent to the desired weighting.
If you want, you can fiddle with the prior on the parameter of interest until fitting yields a posterior with the margin that you desire, but this will be delicate and annoying. I don’t see how either a custom family or me
/mi
syntax would make it any easier. In all of these cases, you specify some distributional information and the model finds the region of parameter space that is jointly most compatible with both the prior and the data. It does not fix the posterior distribution for the parameter to distributions specified in the custom family or via the mi
term.
I don’t want to presume too much about your use case and what is useful to you, but for others who might stumble across this post, I feel like I should re-affirm that I don’t think any of this fiddling is likely to be worthwhile in general. One could use the same time and effort to iterate through alternative model structures and find a well fitting model, rather than tuning the prior-data conflict of a misspecified model precisely so that it contorts, somewhere in the no-mans land between the prior and the likelihood, into some particular pre-defined posterior margin for one parameter.
Thanks for your input, however, you misunderstand my goal. My goal is to avoid estimating a parameter from the current data of interest. Instead, Instead use auxiliary information on this parameter. This auxiliary information has substantial uncertainty which I want to take into account. I now realize this can be thought of as an error in variables model and , thus, me
seems like a good approach.
Why not just set an informative prior on the parameter?
Because I’m trying to understand the model’s poor behavior and I wanted to understand the role of fitting a particular parameter played in that behavior. Simultaneously, I’m trying to better understand the capabilities of brms
and stan
.
I think you are mistaken about what me
does, and about the capabilities of custom families. me
treats the missing value as a parameter and gives it a likelihood term that is equivalent to giving it a prior distribution corresponding to the measurement error distribution, a procedure sometimes referred to as “Bayesian imputation”.
For your use case, this is just a roundabout way to set a prior on a parameter. This is distinct from multiple imputation, wherein the missing value is repeatedly sampled from the measurement error distribution, as described above:
Stan (for good reasons) enforces that the data cannot change midstream during model fitting. Thus, every quantity in a Stan model must either be fixed permanently (i.e. data, transformed data), or be fitted (i.e. parameters), or be a straightforward function of data and parameters (i.e. transformed parameters, local variables). This means that there is guaranteed not to be a clever solution involving me
or a custom family that is capable of fixing the posterior margin in the way that I understand you to be describing. The only options that I’m aware of for fixing the margin are via multiple imputation, via iteratively tweaking the prior, or via reweighting the samples to approximate tweaking the prior (see here New package for sensitivity analysis using importance sampling).
Sorry if I’m still misunderstanding your aim.
I think you still misunderstand what I want to do and it’s probably due to my sloppy use of terms, but I could also be lost (it happens frequently). The idea is instead of, say, using prior(constant(x_est), coef = "x")
, I would create a variable in my data x_est
and x_est_sd
and use me(x = x_est, sdx = x_est_sd)
in my model formula.
This is literally equivalent to declaring a parameter called x
and putting a normal prior on that variable centered at x_est
and with standard deviation x_est_sd
.
I don’t see how that is true since I’m not updating my estimates of x
.
This is what Bayesian imputation is, and it’s what me
does. With me
you tell the model that you know something about the likely location of an observation, and then you treat the true observation as a parameter and fit the value for that parameter jointly with the rest of the model, thereby yielding a posterior distribution for that parameter. For example, if your data model is y ~ normal(x, 1)
, and you observe y = 0
and you measure x = 10
with high uncertainty (say se = 10
), the model set up by brms
via me
will be correctly conclude (assuming that the model is the correct generative model) that the true value of x
is probably somewhere between -2 and 2.
If we take a peek under the hood:
library(brms)
df <- data.frame(
y = 0,
x_est = 1,
x_sd = 1
)
make_stancode(y ~ me(x_est, x_sd), data = df)
yields
// generated with brms 2.18.0
functions {
}
data {
int<lower=1> N; // total number of observations
vector[N] Y; // response variable
int<lower=1> Ksp; // number of special effects terms
// data for noise-free variables
int<lower=1> Mme_1; // number of groups
vector[N] Xn_1; // noisy values
vector<lower=0>[N] noise_1; // measurement noise
int prior_only; // should the likelihood be ignored?
}
transformed data {
}
parameters {
real Intercept; // temporary intercept for centered predictors
vector[Ksp] bsp; // special effects coefficients
real<lower=0> sigma; // dispersion parameter
// parameters for noise free variables
vector[Mme_1] meanme_1; // latent means
vector<lower=0>[Mme_1] sdme_1; // latent SDs
vector[N] zme_1; // standardized latent values
}
transformed parameters {
vector[N] Xme_1; // actual latent values
real lprior = 0; // prior contributions to the log posterior
// compute actual latent values
Xme_1 = meanme_1[1] + sdme_1[1] * zme_1;
lprior += student_t_lpdf(Intercept | 3, 0, 2.5);
lprior += student_t_lpdf(sigma | 3, 0, 2.5)
- 1 * student_t_lccdf(0 | 3, 0, 2.5);
}
model {
// likelihood including constants
if (!prior_only) {
// initialize linear predictor term
vector[N] mu = rep_vector(0.0, N);
mu += Intercept;
for (n in 1:N) {
// add more terms to the linear predictor
mu[n] += (bsp[1]) * Xme_1[n];
}
target += normal_lpdf(Y | mu, sigma);
}
// priors including constants
target += lprior;
target += normal_lpdf(Xn_1 | Xme_1, noise_1);
target += std_normal_lpdf(zme_1);
}
generated quantities {
// actual population-level intercept
real b_Intercept = Intercept;
}
The line target += normal_lpdf(Xn_1 | Xme_1, noise_1);
is the normal prior that’s getting set up by the me
term. Xme_1
is the (transformed) parameter here, and you’ll notice that this has the syntactic form of a likelihood term, with the parameter on the right-hand side rather than the left hand side. This is what I was driving at when I said:
the key observation being that normal_lpdf(a | b, c)
is strictly equal to normal_lpdf(b | a, c)
due to the symmetry of the normal distribution.
Edit: the bigger point here is that this:
is not something that merely happens to be different from what brms::me()
does; it is something that is literally impossible to do in all mature Stan interfaces except by fixing x
to a constant for the entire duration of model fitting.
And just to put this in a bigger picture. If you look through this forum, there was a time when people were looking to use the cut
function from JAGS in Stan because they wanted the information to flow from a parameter to the rest of the model but not from the model to the parameter. This is not possible in Stan, if it’s a parameter it gets updated. If it does not need to be updated, it’s data.
I think the brute force way to get what you want, is to generate values from the desired distribution and use these as data instead of the parameter theta
.
Thank you both for your through explanations. I must admit I didn’t look at exactly how me
was used, but, instead, went by my prior knowledge about error in variable models in traditional statistics (where updating the data values is not how it’s done) and its description on the brms
help pages.
if it’s a parameter it gets updated. If it does not need to be updated, it’s data.
As I tried to convey previously (but clearly failed), I wanted to shift from treating x
as a parameter to treating it as data based on summary stats (which include uncertainty) from other sources. From @jsocolar statements, it seems like it (treating x
as data with uncertainty, not a parameter to be updated) cannot be done. However, @stijn’s statement suggests it can be done, but, I’m inferring, not simply within the ‘brms’ framework. Based on my limited experience with stan
such a solution, I suspect, might involve some form of integration across the uncertainty. I don’t know iif that’s even possible in stan
without writing my own integration routines, I simply don’t know enough about stan
.
To be honest, for my current purposes using ‘constant’ construct will probably be sufficient. Nevertheless, I was curious if this was even possible.
Thanks again for taking time to respond to my questions.
However, @stijn’s statement suggests it can be done, but, I’m inferring, not simply within the ‘brms’ framework. Based on my limited experience with
stan
such a solution, I suspect, might involve some form of integration across the uncertainty.
Let’s say that you are interested in the relation y = \alpha + \beta x ^ \theta with \alpha, \beta, \gamma the parameters and y, x the data. Let’s say you have two observations:
y | x | |
---|---|---|
3 | 5 | |
1 | 2 |
What you want is no longer treat \theta as a parameter but as data from a fixed distribution. For instance, N(0, 2). You can draw 5 values for \theta e.g -1.44, -1.40, -0.45, -0.76, 3.83. Now you have a new data set.
y | x | theta | |
---|---|---|---|
3 | 5 | -1.44 | |
1 | 2 | -1.44 | |
3 | 5 | -1.40 | |
1 | 2 | -1.40 | |
3 | 5 | -0.45 | |
1 | 2 | -0.45 | |
3 | 5 | -0.76 | |
1 | 2 | -0.76 | |
3 | 5 | 3.83 | |
1 | 2 | 3.83 |
Now you can estimate, y = \alpha + \beta x ^ \theta with \alpha, \beta as the parameters and y, x, \theta the data. By the magic of Bayesian updating you are automatically integrating over the uncertainty in \theta when you estimate \alpha and \beta.
@stijn I don’t think the process as described above quite works. For example, if your prior on \theta were very narrow, then by lengthening the dataset in this way you’d clearly get overly tight inference on \alpha and \beta. Instead, you’d need to sample one value of \theta per row in the dataset. But this seems risky and unstable as well. For example, we could imagine that there is a single high-leverage observation in the data set that demands radically different values of \alpha and \beta depending on the value of \theta. Sampling one value of \theta per row associates a single value of \theta with that observation, and thus yields posterior inference on \alpha and \beta that is essentially pegged to a single value of \theta.
The safe brute-force alternative is multiple imputation. That is, sample one value of \theta, associate it with the entire dataset, do the fitting, and then repeat many times, and then combine the resulting posteriors for inference.
@mikegee FWIW I think you were totally clear all along about what you really wanted :) The problem is that there’s just one reliable way to achieve it in Stan:
You can refit the model many times for different values of the parameter of interest, drawn from the distribution of interest, and combine the resulting posterior chains for inference.
@jsocolar , I’m glad I was clear and again appreciate your help with this. You saved me endless confusion had I naively gone down the me()
route.
@stijn, I had a similar concern as @jsocolar about replicating your data. The curvature of the log-likelihood function (llik) would be altered (amplified) relative to the true amount of information you have in your data.
Intuitively (which is always dangerous in statistics), I expect one might overcome this by dividing the llik of the replicated data by the number of replicates n
. This would give the average llik across the samples (essentially a stochastic integration) Perhaps by creating a custom family [sorry @jsocolar, I can’t help myself] which adds an n
argument and adjusts the llik of the standard model accordingly.
Intuitively (which is always dangerous in statistics), I expect one might overcome this by dividing the llik of the replicated data by the number of replicates
n
.
I agree that, at least on its face, it seems like this could work. And that a custom family could achieve it.