Linear model where each x has it's own normal distribution

nschiett · January 20, 2019, 12:36am

Hi!

I am trying to fit a linear model where every single x has it’s own mean and sd.
Looking through forums on measurement errors, this is what I’ve come up with.
Unfortunately, I don’t get any good results.
I keep getting the following error, no matter how much iterations I set.

There were 4 chains where the estimated Bayesian Fraction of Missing Information was low. See
http://mc-stan.org/misc/warnings.html#bfmi-low

Anyone has some thoughts on where the issue might be? Thanks in advance!

data {
int<lower=0> N;  // number of obs
real x_mean[N];        // mean of x
real<lower=0> x_sd[N]; // sd x
real y[N];

}

parameters {
vector<lower=-5>[N] x;          // unknown true value x
real<lower=-10> mu_x;           // prior x
real<lower=0> sigma_x;        // prior scale

real alpha;           // intercept
real beta;            // slope
real<lower=0> sigma;  // outcome noise
}

transformed parameters {
vector[N] mu;
mu = alpha + beta * x;
}

model {

// priors
alpha ~ normal(5.4, 1);
beta ~ normal(0.99, 0.12);
sigma ~ cauchy(0, 5);

mu_x ~ normal(5, 2);  
sigma_x ~ cauchy(0, 1);    

for (n in 1:N){
x[n] ~ normal(mu_x, sigma_x);   // prior
x_mean[n] ~ normal(x[n], x_sd[n]);    
}
y ~ normal(mu, sigma);

}

bbbales2 · January 22, 2019, 4:24am

Are you trying this with simulated data?

As a guess, maybe just try swapping in normal(0, 1)s in place of the cauchys? If that helps, there’s a reparameterization in the manual that can make it easier to sample Cauchy random variables: https://mc-stan.org/docs/2_18/stan-users-guide/reparameterization-section.html (“Reparameterizing the Cauchy”).

nschiett · January 22, 2019, 11:55pm

Thanks a lot for your answer.
I tried to use simulated x values for each iteration.
It looks like this works alright. Could someone confirm this is something I can do?
Or is it better to simulate the data first, then fit the model and repeat that process multiple times?
Thanks!
Here’s my stan code:


data {
  // data for regression
  int<lower=0> N;  // number of obs
  real x_mean[N];        // mean of x
  real<lower=0> x_sd[N]; // sd x
  real y[N];

}

transformed data{
  real x[N];          // unknown true value x
  // simulate values
  for (n in 1:N){
  x[n] = normal_rng(x_mean[n], x_sd[n]);  
}
}
parameters {
  real alpha;           // intercept
  real beta;            // slope
  real<lower=0> sigma;  // outcome noise
}

transformed parameters {
  real mu[N];

  for (n in 1:N){
    mu[n] = alpha + beta * x[n];
  }

}

model {

// priors
alpha ~ normal(5.3, 1);    // Brandl et al. 2018
beta ~ normal(0.99, 0.12); // Brandl et al. 2018
sigma ~ cauchy(0, 5);

// likelihood
y ~ normal(mu, sigma);
}

bbbales2 · February 24, 2019, 11:37pm

Apologies for the delayed response!

The transformed data block only gets run once, so I believe this is what you’re doing here.

I was looking more at your model though. Is the situation that you have a large number of measurements of x for every y and so you are summarizing them in terms of their mean and standard deviation?

nschiett · February 24, 2019, 11:53pm

Thanks for your reply.

The reason the data has means and standard deviations is indeed because it has been summarized. Basically, these are all values coming from different older scientific papers that do not include the raw data.
I hope this can be a good solution to tackle the issue with data that is only published in a summarized way, because this is fairly common in ecology and other fields.

bbbales2 · February 25, 2019, 2:38am

Oh, I think we want to be doing something else then. This is a meta-analysis.

I’m not familiar with stuff, so best I can do is say that and point you at some resources:

https://mc-stan.org/docs/2_18/stan-users-guide/meta-analysis.html
https://devinincerti.com/2015/10/31/bayesian-meta-analysis.html
Page 124 in BDA3, Section 5.6 “Hierarchical modeling applied to a meta-analysis”

All of those use binomial data, but the way it is incorporated into the model is by making a normal approximation to the binomial. It sounds like you already have estimates and their confidence intervals summarized as mean + standard deviation (this is the log odds stuff), so you can just skip that step.

Hope that helps haha. Sorry for the month delay where I didn’t tell you this stuff :D.

nschiett · March 1, 2019, 12:43pm

Thanks a lot for the resources.
No worries, this is still very useful for future projects.

Topic		Replies	Views
Model Validation - Linear Regression with X and Y Uncertainty Modeling specification	4	1243	March 5, 2024
Fitting issues for model with measurement error Modeling	4	70	April 10, 2025
Linear regression with data-dependent error variances Modeling techniques	8	1365	August 1, 2024
Priors for binomial trials in generalized linear models Modeling	3	509	July 6, 2018
Another measurement error modeling question Modeling	13	585	July 5, 2023

Linear model where each x has it's own normal distribution

Related topics