Inefficient sampling with small experimental uncertainties

Dear All,

I’m having a curious problem with fitting a very simple linear model to data with very small experimental uncertainties (erry) and an additional scatter (sigma). I feel embarrassed to post this because I thought it would be so simple, but I haven’t been able to make any headway. The sampling for this model seems to be incredibly inefficient. The example below takes several hours, mostly owing to the large treedepth. However, smaller treedepths lead to warnings and bimodal (wrong) posteriors.

I’m using rstan. The data looks like:

dlist <- list(
    N=6
    x=c(1.4864179, 0.6050052, 0.3305731, -0.3454066, -0.8971667, -1.1794228)
    obsy=c(1.4824414, 0.6087327, 0.3342787, -0.3489581, -0.8926746, -1.1838201)
    erry=c(4.136830e-07, 3.891018e-07, 4.618516e-07, 1.160271e-07, 1.045580e-04, 2.858379e-07)
)

I’m running stan with

 model1d.stan <- stan(file="stantest1d.stan", data=dlist,
                      warmup=2000, iter=5000, chains=3, cores=3,
                      control=list(max_treedepth=25))

With the following stan code:

data{
    int<lower=1> N;
    vector[N] x;
    vector[N] obsy;
    vector[N] erry;
}
parameters{
    real a;
    real b1;
    real<lower=0> sigma;
    vector[N] y;
}
transformed parameters{
    vector[N] mu;
    for ( i in 1:N ) {
        mu[i] = a + b1 * x[i];
    }
}
model{
    sigma ~ cauchy( 0 , 1 );    // Prior on scatter
    b1 ~ normal( 0 , 1 );      // prior on slope
    a ~ normal( 0 , 1 );       // prior on intercept
    y ~ normal( mu , sigma );
    obsy ~ normal( y , erry );
}

First - it is OK to feel intimidated by Stan, I also banged my head a lot when I was starting some time ago. Also note that I am mostly a Stan beginner-to-intermediate procrastinating on the forums to avoid working on paper resubmission so my advice can be of limited value :-)

Second: Are you really sure your model is well defined? Because your model looks like it might be (not an expert here) non-identifiable - slightly decreasing all y and increasing a could give almost exactly the same posterior probability which breaks the sampler (but I might be wrong on this one).

Other possibility is that those erry at the 10e-7 range are causing numerical issues. Since the errors are so much smaller than sigma are they actually relevant? And since y and obsy cannot really differ that much, is it necessary to determine them? Just having obsy ~ normal(mu, sigma + erry) gets rid of all the sampling problems. If you are interested in the distributions of latent y themselves, you might be able to to compute them analytically from the other fitted parameters.

Note that sampling problems persist, even if you increase erry <- erry * 1e5 giving you all sort of red flags in the pairs plot (linear dependecies, funnels) so I would guess that the model itself is problematic. The pairs plot I got with larger erry is below for someone more experienced to make guesses :-).
image

The problem here is the extreme differences in scale – because the erry are so small the posterior for the latent y will be very, very, very narrow relative to the posterior for the other parameters. Stan will try to adapt to compensate but in problems like this it can do only so much.

You can reduce the burden on the adaptation by rescaling the problem, although it will be a bit tricky here. The easier solution is to recognize that you can marginalize out the latent y analytically to give the equivalent model

model{
    sigma ~ cauchy( 0 , 1 );    // Prior on scatter
    b1 ~ normal( 0 , 1 );      // prior on slope
    a ~ normal( 0 , 1 );       // prior on intercept
    obsy ~ normal(mu , sqrt(square(sigma) + square(erry)) );
}

Thank you, both, for your help. In this case, the measurement errors probably are irrelevant, but in other sets of data I cannot guarantee that they will be. So I hope to find a good model that will handle a range of cases.

Edit: martinmodrak’s option of leaving out the inference on y also works nicely if I combine the two uncertainties. I may end up using this method because I don’t need to infer y.

betanalpha: did you mean this (with y rather than mu as the mean)?
obsy ~ normal(y , sqrt(square(sigma) + square(erry)) );

If so, it did exactly what I need. The model samples very efficiently. It also makes sense , I think, to tell the model that the y-values are varying within their experimentally determined uncertainty plus the intrinsic scatter. I’m very happy with this solution! Thanks again!

Should I mark this as [solved] somehow?

No, I meant mu but forgot to delete the line above it (which I just did).

Oops. My reply ends up being confusing so just to clear things up:

betanalpha and martinmodrak have essentially the same solution, which worked very nicely. Thank you again for the help and explanations.

I think we should all just put that in our sigs and be done with it :-).

2 Likes