Hi!
I am trying to fit a linear model where every single x has it’s own mean and sd.
Looking through forums on measurement errors, this is what I’ve come up with.
Unfortunately, I don’t get any good results.
I keep getting the following error, no matter how much iterations I set.
There were 4 chains where the estimated Bayesian Fraction of Missing Information was low. See
http://mc-stan.org/misc/warnings.html#bfmi-low
Anyone has some thoughts on where the issue might be? Thanks in advance!
data {
int<lower=0> N; // number of obs
real x_mean[N]; // mean of x
real<lower=0> x_sd[N]; // sd x
real y[N];
}
parameters {
vector<lower=-5>[N] x; // unknown true value x
real<lower=-10> mu_x; // prior x
real<lower=0> sigma_x; // prior scale
real alpha; // intercept
real beta; // slope
real<lower=0> sigma; // outcome noise
}
transformed parameters {
vector[N] mu;
mu = alpha + beta * x;
}
model {
// priors
alpha ~ normal(5.4, 1);
beta ~ normal(0.99, 0.12);
sigma ~ cauchy(0, 5);
mu_x ~ normal(5, 2);
sigma_x ~ cauchy(0, 1);
for (n in 1:N){
x[n] ~ normal(mu_x, sigma_x); // prior
x_mean[n] ~ normal(x[n], x_sd[n]);
}
y ~ normal(mu, sigma);
}
Are you trying this with simulated data?
As a guess, maybe just try swapping in normal(0, 1)
s in place of the cauchys? If that helps, there’s a reparameterization in the manual that can make it easier to sample Cauchy random variables: https://mc-stan.org/docs/2_18/stan-users-guide/reparameterization-section.html (“Reparameterizing the Cauchy”).
Thanks a lot for your answer.
I tried to use simulated x values for each iteration.
It looks like this works alright. Could someone confirm this is something I can do?
Or is it better to simulate the data first, then fit the model and repeat that process multiple times?
Thanks!
Here’s my stan code:
data {
// data for regression
int<lower=0> N; // number of obs
real x_mean[N]; // mean of x
real<lower=0> x_sd[N]; // sd x
real y[N];
}
transformed data{
real x[N]; // unknown true value x
// simulate values
for (n in 1:N){
x[n] = normal_rng(x_mean[n], x_sd[n]);
}
}
parameters {
real alpha; // intercept
real beta; // slope
real<lower=0> sigma; // outcome noise
}
transformed parameters {
real mu[N];
for (n in 1:N){
mu[n] = alpha + beta * x[n];
}
}
model {
// priors
alpha ~ normal(5.3, 1); // Brandl et al. 2018
beta ~ normal(0.99, 0.12); // Brandl et al. 2018
sigma ~ cauchy(0, 5);
// likelihood
y ~ normal(mu, sigma);
}
Apologies for the delayed response!
The transformed data block only gets run once, so I believe this is what you’re doing here.
I was looking more at your model though. Is the situation that you have a large number of measurements of x for every y and so you are summarizing them in terms of their mean and standard deviation?
Thanks for your reply.
The reason the data has means and standard deviations is indeed because it has been summarized. Basically, these are all values coming from different older scientific papers that do not include the raw data.
I hope this can be a good solution to tackle the issue with data that is only published in a summarized way, because this is fairly common in ecology and other fields.
Oh, I think we want to be doing something else then. This is a meta-analysis.
I’m not familiar with stuff, so best I can do is say that and point you at some resources:
- https://mc-stan.org/docs/2_18/stan-users-guide/meta-analysis.html
- https://devinincerti.com/2015/10/31/bayesian-meta-analysis.html
- Page 124 in BDA3, Section 5.6 “Hierarchical modeling applied to a meta-analysis”
All of those use binomial data, but the way it is incorporated into the model is by making a normal approximation to the binomial. It sounds like you already have estimates and their confidence intervals summarized as mean + standard deviation (this is the log odds stuff), so you can just skip that step.
Hope that helps haha. Sorry for the month delay where I didn’t tell you this stuff :D.
2 Likes
Thanks a lot for the resources.
No worries, this is still very useful for future projects.