Dealing with uncertain parameters that aren't purely fitting parameters

I think you’ve answered your own question then.

Is that cut mechanism in BUGS discussed here?: https://groups.google.com/forum/#!topic/stan-users/ruxPAhBPI4A

ETA: Looks like the paper here is also about that cut mechanism: http://statmath.wu.ac.at/research/talks/resources/Plummer_WU_2015.pdf

Yeah +1 @ the cut question, I’m curious too.

I was surprised that Stan even let me do that

Yeah everything on either side can be a function of data or parameters or whatever. I just get scared when things start moving away from where I expect them to be cause I figure I might be doing something crazy.

Alright I had a think about generate several plausible but random values for E.

So the big problem I’d expect here is that now you kinda have arbitrary control over the amount of data you have. If you generate lots and lots of data, then when you look at the posteriors of ‘B’ you’ll get a tight estimate. And that tight estimate probably isn’t gonna be meaningful if the model itself isn’t exactly right.

So it’s like because we control the volume of data here, we could easily mislead ourselves on our uncertainties as much as the other things we could try (where we fix E or fix B and just hope)?

The same thing happens fitting a line to data. With a lot of data, the posteriors on a and b in y ~ normal(a * x + b, 1) get tight regardless of how good a description the line is the of the data.

Another example would be if we generated data like:

x ~ normal(0, 1)
y ~ normal(x, 1)

And then fit the data using y ~ normal(0, b), then we fit the data, but the randomness from the x process is just getting absorbed by the b parameter. This would work here, but if x ~ gamma(1, 1) for instance, we’d still get an estimate on b but it might not be a bit misleading?

Maybe? I’m not 100% sure.

Here’s a script to play with this stuff (fitting a line to a nonlinear model):

You may be right about that. I suspected that the approach that I was thinking about would be naive.

Anyway, I read further in that paper by Plummer and noticed this:

The target distribution p∗ (θ ) can always be estimated by multiple imputation (MI), i.e. generating a sequence of samples φ(1) , . . . φ(T) and then fitting T separate models for θ such that, under model t, φ is considered to be observed at φ = φ(t). Pooled MCMC samples from all T models can be used to estimate the marginal density p∗ (θ ).

That looks like it may be feasible for me, even if it’s rather a brute force approach. Would I be correct in gathering that pooling MCMC samples to get a posterior distribution would pretty much be a matter of concatenating or shuffling together all the chains from the MCMC simulations and then generating a histogram or kernel density estimation from the samples from all of the chains?

1 Like

Yes, that’s the thing to do. But you want to fit each imputed data set separately with multiple chains to make sure things are converging for each imputed data set before mixing them.

1 Like