# How to include data measurement uncertainty in stan/pystan

I am completely new to stan. I simply wanted to fit a data which has uncertainty in measurements, but I could not include the uncertainty in the fitting. For example, I have x[N], y[N] and yerror[N] arrays with dimensions N. Suppose the data is 2nd order polynomial: y=a0+a1x+a2x*x, and I have the array yerror[N]. Now my code in pystan is given below:

``````import pystan
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np

A0=0.5; A1=1.5; A2=-0.2; A3=-0.008; A4=0.00025
def func(xx):
return A0+A1*xx+A2*xx*xx#+A3*(x**3)+A4*(x**4)

x=10*np.random.rand(100); x=np.sort(x)
fx=func(x);
yerror=np.random.rand(len(x))
y=np.random.normal(fx,scale=yerror)

np.random.seed(101)

model = """
data {
int<lower=0> N;
vector[N] x;
vector[N] y;
}
parameters {
real a0;
real a1;
real a2;
real<lower=0> sigma;
}
model {
vector[N] x2;
for(i in 1:N){x2[i]=x[i]*x[i];}
y ~ normal(a0 + a1 * x + a2 * x2, sigma);
}
"""

# Put our data in a dictionary
data = {'N': len(x), 'x': x, 'y': y}

# Compile the model
sm = pystan.StanModel(model_code=model)

# Train the model and generate samples
fit = sm.sampling(data=data, iter=2000, chains=4, warmup=400, thin=3, seed=101)

``````

The code does not use the measurement uncertainty, yerror[N]. How to do that?

1 Like

I would just implement the priors with data means and SDs. Some clues can be found here: https://stackoverflow.com/questions/44142274/incorporating-uncertainty-into-a-pymc3-model

Sorry, I found it very hard to understand. In reality I would just have data (y[N]), yerror[N], x[N] and I have to choose a model. How can I take the mean?
For example, in emcee (or most mcmc codes) we could simply include yerror[N] in the residual as: residual= (data-model)^2/yerror^2. I thought there would be something similar in stan.

Any link with “https://mc-stan.org/…” is not opening (at least from my location)!! Could you open the above link?

See section 6

Here the error in x is discussed. But what if I have error in y (that is the data wich we are trying to model). As I gave it a thought it should not be so complicated, all we have to do is to modify the chi^2/log_probability. Actually we have to divide the likelihood by yerror^2, that is it. Is there no such option in stan?

In your first model `sigma` represents the error in `y` and it is treated as unknown. If you know the error you can substitute that.

``````data {
int<lower=0> N;
vector[N] x;
vector[N] y;
vector[N] yerror;
}
parameters {
real a0;
real a1;
real a2;
}
model {
for(i in 1:N) {
y[i] ~ normal(a0 + a1 * x[i] + a2 * x[i] * x[i], yerror[i]);
}
}
``````
2 Likes