# Modeling truncated vs. rounded observations

I’m trying to understand how to model measurements that are truncated vs. rounded. I read this thread (Mix of rounded and non rounded observation) which discusses rounded measurements, and I thought it would be the same approach except instead of `log Pr[z in (y - 0.5, y + 0.5)]` it would be `log Pr[z in (y , y + 0.5)]` for the true measurement.

Here’s the “true” measurements model

``````data {
int<lower=0> N;   // number of data items
vector[N] x;      // predictor vector
vector[N] y;      // outcome vector
}
parameters {
real alpha;           // intercept
real beta;            // coefficients for predictor
real<lower=0> sigma;  // error scale
}
model {
// Likelihood
y ~ normal(x * beta + alpha, sigma);
//target += normal_lpdf(y | mu, sigma);
// Priors
alpha ~ normal(0, 1);
//target += normal_lpdf(alpha | 0, 1);
beta ~ normal(0, 1);
//target += normal_lpdf(beta | 0, 1);
sigma ~ exponential(1);
//target += exponential_lpdf(sigma | 1);
}
``````

Here’s what I tried for the truncated measurements model

``````data {
int<lower=0> N;   // number of data items
vector[N] x;      // predictor vector
vector[N] y;      // outcome vector
}
parameters {
real alpha;           // intercept
real beta;            // coefficients for predictor
real<lower=0> sigma;  // error scale
}
model {
vector[N] mu;
mu = x * beta + alpha;
// Likelihood
target += log(Phi((y + 1.0 - mu) / sigma)
- Phi((y - mu) / sigma));
// Priors
//alpha ~ normal(0, 1);
target += normal_lpdf(alpha | 0, 1);
//beta ~ normal(0, 1);
target += normal_lpdf(beta | 0, 1);
//sigma ~ exponential(1);
target += exponential_lpdf(sigma | 1);
}
``````

and here’s the R code where I simulate some data to try this out:

``````library(rstan)
library(tidyverse)
library(here)

N <- 1000
alpha_true = -1
beta_true = 1
sigma_true = 1

df <- data.frame(x = rnorm(N, 0, 5)) %>%
mutate(y_true = alpha_true + beta_true*x + rnorm(N, 0, sigma_true),
y_round = round(y_true),
y_trunc = trunc(y_true)
)

# Fit true data model
mdl1 <- stan(file=here("rounding_example/y_true.stan"),
data=list(N = N,
y = df\$y_true,
x = df\$x),
model_name="mdl1",
chains=4)

# Fit truncated data to "incorrect" model
mdl2 <- stan(file=here("rounding_example/y_true.stan"),
data=list(N = N,
y = df\$y_trunc,
x = df\$x),
model_name="mdl2",
chains=4)

# Fit truncated data to "correct" model
mdl3 <- stan(file=here("rounding_example/y_trunc.stan"),
data=list(N = N,
y = df\$y_trunc,
x = df\$x),
model_name="mdl3",
chains=4)
``````

I expected `mdl3`'s posterior estimate for `beta` would be closer to the true value, but it was unchanged; instead the posterior for the intercept was further from the true value.

I’ve mostly used the `rethinking` package so I’m still new to writing models directly in `rstan`. At this point, I’m not sure if my thought process for adapting the rounded measurement model was wrong or if I’ve implemented it wrong in stan.

R’s `trunc()` function rounds values towards zero, so it is equivalent to `ceiling()` for negative values and `floor()` for positive values. I don’t think your model reflects this sign-dependence.

1 Like

That was the problem–thank you @mike-lawrence !

Here’s my updated truncated measurement model, and now the posterior distributions make sense.

``````data {
int<lower=0> N;   // number of data items
vector[N] x;      // predictor vector
vector[N] y;      // outcome vector
}
parameters {
real alpha;           // intercept
real beta;            // coefficients for predictor
real<lower=0> sigma;  // error scale
}
model {
vector[N] mu;
mu = x * beta + alpha;
// Likelihood
for(n in 1:N) {
if(y[n] < 0.0) {
target += log(Phi((y[n] - mu[n]) / sigma)
- Phi((y[n] - 1.0 - mu[n]) / sigma));     }
else {
target += log(Phi((y[n] + 1.0 - mu[n]) / sigma)
- Phi((y[n] - mu[n]) / sigma));
}
}
// Priors
//alpha ~ normal(0, 1);
target += normal_lpdf(alpha | 0, 1);
//beta ~ normal(0, 1);
target += normal_lpdf(beta | 0, 1);
//sigma ~ exponential(1);
target += exponential_lpdf(sigma | 1);
}``````
1 Like