I’m trying to understand how to model measurements that are truncated vs. rounded. I read this thread (Mix of rounded and non rounded observation) which discusses rounded measurements, and I thought it would be the same approach except instead of log Pr[z in (y - 0.5, y + 0.5)]
it would be log Pr[z in (y , y + 0.5)]
for the true measurement.
Here’s the “true” measurements model
data {
int<lower=0> N; // number of data items
vector[N] x; // predictor vector
vector[N] y; // outcome vector
}
parameters {
real alpha; // intercept
real beta; // coefficients for predictor
real<lower=0> sigma; // error scale
}
model {
// Likelihood
y ~ normal(x * beta + alpha, sigma);
//target += normal_lpdf(y | mu, sigma);
// Priors
alpha ~ normal(0, 1);
//target += normal_lpdf(alpha | 0, 1);
beta ~ normal(0, 1);
//target += normal_lpdf(beta | 0, 1);
sigma ~ exponential(1);
//target += exponential_lpdf(sigma | 1);
}
Here’s what I tried for the truncated measurements model
data {
int<lower=0> N; // number of data items
vector[N] x; // predictor vector
vector[N] y; // outcome vector
}
parameters {
real alpha; // intercept
real beta; // coefficients for predictor
real<lower=0> sigma; // error scale
}
model {
vector[N] mu;
mu = x * beta + alpha;
// Likelihood
target += log(Phi((y + 1.0 - mu) / sigma)
- Phi((y - mu) / sigma));
// Priors
//alpha ~ normal(0, 1);
target += normal_lpdf(alpha | 0, 1);
//beta ~ normal(0, 1);
target += normal_lpdf(beta | 0, 1);
//sigma ~ exponential(1);
target += exponential_lpdf(sigma | 1);
}
and here’s the R code where I simulate some data to try this out:
library(rstan)
library(tidyverse)
library(here)
N <- 1000
alpha_true = -1
beta_true = 1
sigma_true = 1
df <- data.frame(x = rnorm(N, 0, 5)) %>%
mutate(y_true = alpha_true + beta_true*x + rnorm(N, 0, sigma_true),
y_round = round(y_true),
y_trunc = trunc(y_true)
)
# Fit true data model
mdl1 <- stan(file=here("rounding_example/y_true.stan"),
data=list(N = N,
y = df$y_true,
x = df$x),
model_name="mdl1",
chains=4)
# Fit truncated data to "incorrect" model
mdl2 <- stan(file=here("rounding_example/y_true.stan"),
data=list(N = N,
y = df$y_trunc,
x = df$x),
model_name="mdl2",
chains=4)
# Fit truncated data to "correct" model
mdl3 <- stan(file=here("rounding_example/y_trunc.stan"),
data=list(N = N,
y = df$y_trunc,
x = df$x),
model_name="mdl3",
chains=4)
I expected mdl3
's posterior estimate for beta
would be closer to the true value, but it was unchanged; instead the posterior for the intercept was further from the true value.
I’ve mostly used the rethinking
package so I’m still new to writing models directly in rstan
. At this point, I’m not sure if my thought process for adapting the rounded measurement model was wrong or if I’ve implemented it wrong in stan.