I’m trying to understand how to model measurements that are truncated vs. rounded. I read this thread (Mix of rounded and non rounded observation) which discusses rounded measurements, and I thought it would be the same approach except instead of log Pr[z in (y - 0.5, y + 0.5)] it would be log Pr[z in (y , y + 0.5)] for the true measurement.
Here’s the “true” measurements model
data {
int<lower=0> N; // number of data items
vector[N] x; // predictor vector
vector[N] y; // outcome vector
}
parameters {
real alpha; // intercept
real beta; // coefficients for predictor
real<lower=0> sigma; // error scale
}
model {
// Likelihood
y ~ normal(x * beta + alpha, sigma);
//target += normal_lpdf(y | mu, sigma);
// Priors
alpha ~ normal(0, 1);
//target += normal_lpdf(alpha | 0, 1);
beta ~ normal(0, 1);
//target += normal_lpdf(beta | 0, 1);
sigma ~ exponential(1);
//target += exponential_lpdf(sigma | 1);
}
Here’s what I tried for the truncated measurements model
data {
int<lower=0> N; // number of data items
vector[N] x; // predictor vector
vector[N] y; // outcome vector
}
parameters {
real alpha; // intercept
real beta; // coefficients for predictor
real<lower=0> sigma; // error scale
}
model {
vector[N] mu;
mu = x * beta + alpha;
// Likelihood
target += log(Phi((y + 1.0 - mu) / sigma)
- Phi((y - mu) / sigma));
// Priors
//alpha ~ normal(0, 1);
target += normal_lpdf(alpha | 0, 1);
//beta ~ normal(0, 1);
target += normal_lpdf(beta | 0, 1);
//sigma ~ exponential(1);
target += exponential_lpdf(sigma | 1);
}
and here’s the R code where I simulate some data to try this out:
library(rstan)
library(tidyverse)
library(here)
N <- 1000
alpha_true = -1
beta_true = 1
sigma_true = 1
df <- data.frame(x = rnorm(N, 0, 5)) %>%
mutate(y_true = alpha_true + beta_true*x + rnorm(N, 0, sigma_true),
y_round = round(y_true),
y_trunc = trunc(y_true)
)
# Fit true data model
mdl1 <- stan(file=here("rounding_example/y_true.stan"),
data=list(N = N,
y = df$y_true,
x = df$x),
model_name="mdl1",
chains=4)
# Fit truncated data to "incorrect" model
mdl2 <- stan(file=here("rounding_example/y_true.stan"),
data=list(N = N,
y = df$y_trunc,
x = df$x),
model_name="mdl2",
chains=4)
# Fit truncated data to "correct" model
mdl3 <- stan(file=here("rounding_example/y_trunc.stan"),
data=list(N = N,
y = df$y_trunc,
x = df$x),
model_name="mdl3",
chains=4)
I expected mdl3's posterior estimate for beta would be closer to the true value, but it was unchanged; instead the posterior for the intercept was further from the true value.
I’ve mostly used the rethinking package so I’m still new to writing models directly in rstan. At this point, I’m not sure if my thought process for adapting the rounded measurement model was wrong or if I’ve implemented it wrong in stan.