Measurement Error Modeling

I’ve been struggling to implement a particular model in Stan. It could just be because of my lack of sophistication with statistics. It’s worth noting that while trying to decipher my question.

I have two sets of answers to a survey. Each set represents a group. There is a single survey question with only two possible answers. I know how to use Stan to determine the likely parameters of the two groups using a binomial distribution. And I can use those to determine the likelihood that the two groups have the same probability of answering the survey question in the same way.

However, there’s reason to believe that the responses to the survey questions are not an accurate representation of what is being measured (in this case a particular mental state). There is some research that suggests how accurate these types of surveys are likely to be and I’d like to include that in my model. For example, suppose that if a person answers ‘yes’ in the survey, there’s a 0.8 probability that the ‘yes’ accurately represents their belief. This is what I’m referring to as measurement error.

How can I model this? I tried to build custom function, but I was unsuccessful. I also read through a bit of the manual and was able to get a few things compiling, but nothing like what I was after. I think part of the problem is I’m not quite sure what the return value of the function is meant to represent, but also because I’m not even sure the best way to model this (maybe it’s as simple as using an out of the box distribution with a higher sd?).

It seems like this is probably trivial. Any help is greatly appreciated!

I would go with a measurement error model, have a look at chapter 14 in the Stan manual for some examples:

Thanks, Lionel. I checked out chapter 14. To better understand the model I tried to implement it, but I cannot even get the code from the manual to run. Here’s what I have (from p 203-204):

data {
int<lower=0> N;
real x[N];
real y[N];
parameters {
real alpha;
real beta;
real<lower=0> sigma; // outcome noise
model {
y ~ normal(alpha + beta * x, sigma);
alpha ~ normal(0, 10);
beta ~ normal(0, 10);
sigma ~ cauchy(0, 5);

But when I try to run it I get the following error:

Just some fake data I had lying around

data_list <- list(N = 250, y = c(rep(0, 250)), x = c(rep(1, 189), rep(0, 61)))
stan_samples <- stan(model_code = model_string, data = data_list)


No matches for:

real * real[]

Available argument signatures for operator*:

real * real
vector * real
row vector * real
matrix * real
row vector * vector
vector * row vector
matrix * vector
row vector * matrix
matrix * matrix
real * vector
real * row vector
real * matrix

No matches for:

real + ill formed

Available argument signatures for operator+:

int + int
real + real
vector + vector
row vector + row vector
matrix + matrix
vector + real
row vector + real
matrix + real
real + vector
real + row vector
real + matrix
+row vector

expression is ill formed
error in ‘model366311c7c2b_21bda842cc3ad5b9be5383599a2a166c’ at line 13, column 30

11: } 
12: model {
13:   y ~ normal(alpha + beta * x, sigma);
14:   alpha ~ normal(0, 10);

Am I missing something?

Note that this model is a simple linear regression there are no measurement errors in it. For this specific error, changing the data declaration into:

int<lower=0> N
vector[N] x;
vector[N] y;

Worked for me. I guess that due to recent update in Stan / rstan the code for this model is no longer working.

You can also skim through the may example models available here:

In addition if you are new to Stan you could use the package brms:, which does the translation from R into Stan code. In brms you can also fit some measurement error models.

You can’t multiply a scalar by an array—only by a vector.

My guess is just a typo in the manual. We try very hard to make sure we don’t break old code. The only backward incompatibility we’ve introduced is the direct manipulation of lp__, which we no longer allow.