Modeling truncated vs. rounded observations

I’m trying to understand how to model measurements that are truncated vs. rounded. I read this thread (Mix of rounded and non rounded observation) which discusses rounded measurements, and I thought it would be the same approach except instead of log Pr[z in (y - 0.5, y + 0.5)] it would be log Pr[z in (y , y + 0.5)] for the true measurement.

Here’s the “true” measurements model

data {
  int<lower=0> N;   // number of data items
  vector[N] x;      // predictor vector
  vector[N] y;      // outcome vector
}
parameters {
  real alpha;           // intercept
  real beta;            // coefficients for predictor
  real<lower=0> sigma;  // error scale
}
model {
  // Likelihood
  y ~ normal(x * beta + alpha, sigma);
  //target += normal_lpdf(y | mu, sigma);
  // Priors
  alpha ~ normal(0, 1);
  //target += normal_lpdf(alpha | 0, 1);
  beta ~ normal(0, 1);
  //target += normal_lpdf(beta | 0, 1);
  sigma ~ exponential(1);
  //target += exponential_lpdf(sigma | 1);
}

Here’s what I tried for the truncated measurements model

data {
  int<lower=0> N;   // number of data items
  vector[N] x;      // predictor vector
  vector[N] y;      // outcome vector
}
parameters {
  real alpha;           // intercept
  real beta;            // coefficients for predictor
  real<lower=0> sigma;  // error scale
}
model {
  vector[N] mu;
  mu = x * beta + alpha;
  // Likelihood
  target += log(Phi((y + 1.0 - mu) / sigma) 
    - Phi((y - mu) / sigma));
  // Priors
  //alpha ~ normal(0, 1);
  target += normal_lpdf(alpha | 0, 1);
  //beta ~ normal(0, 1);
  target += normal_lpdf(beta | 0, 1);
  //sigma ~ exponential(1);
  target += exponential_lpdf(sigma | 1);
}

and here’s the R code where I simulate some data to try this out:

library(rstan)
library(tidyverse)
library(here)

N <- 1000
alpha_true = -1
beta_true = 1
sigma_true = 1

df <- data.frame(x = rnorm(N, 0, 5)) %>%
  mutate(y_true = alpha_true + beta_true*x + rnorm(N, 0, sigma_true),
         y_round = round(y_true),
         y_trunc = trunc(y_true)
  )

# Fit true data model
mdl1 <- stan(file=here("rounding_example/y_true.stan"), 
             data=list(N = N,
                       y = df$y_true,
                       x = df$x), 
             model_name="mdl1",
             chains=4)

# Fit truncated data to "incorrect" model
mdl2 <- stan(file=here("rounding_example/y_true.stan"), 
             data=list(N = N,
                       y = df$y_trunc,
                       x = df$x), 
             model_name="mdl2",
             chains=4)

# Fit truncated data to "correct" model
mdl3 <- stan(file=here("rounding_example/y_trunc.stan"), 
             data=list(N = N,
                       y = df$y_trunc,
                       x = df$x), 
             model_name="mdl3",
             chains=4)

I expected mdl3's posterior estimate for beta would be closer to the true value, but it was unchanged; instead the posterior for the intercept was further from the true value.

I’ve mostly used the rethinking package so I’m still new to writing models directly in rstan. At this point, I’m not sure if my thought process for adapting the rounded measurement model was wrong or if I’ve implemented it wrong in stan.

R’s trunc() function rounds values towards zero, so it is equivalent to ceiling() for negative values and floor() for positive values. I don’t think your model reflects this sign-dependence.

1 Like

That was the problem–thank you @mike-lawrence !

Here’s my updated truncated measurement model, and now the posterior distributions make sense.

data {
  int<lower=0> N;   // number of data items
  vector[N] x;      // predictor vector
  vector[N] y;      // outcome vector
}
parameters {
  real alpha;           // intercept
  real beta;            // coefficients for predictor
  real<lower=0> sigma;  // error scale
}
model {
  vector[N] mu;
  mu = x * beta + alpha;
  // Likelihood
  for(n in 1:N) {
    if(y[n] < 0.0) {
      target += log(Phi((y[n] - mu[n]) / sigma) 
          - Phi((y[n] - 1.0 - mu[n]) / sigma));     }
    else {
      target += log(Phi((y[n] + 1.0 - mu[n]) / sigma) 
          - Phi((y[n] - mu[n]) / sigma));    
    }
  }
  // Priors
  //alpha ~ normal(0, 1);
  target += normal_lpdf(alpha | 0, 1);
  //beta ~ normal(0, 1);
  target += normal_lpdf(beta | 0, 1);
  //sigma ~ exponential(1);
  target += exponential_lpdf(sigma | 1);
}
1 Like