Modeling truncated vs. rounded observations

mkwong · July 25, 2021, 8:25pm

I’m trying to understand how to model measurements that are truncated vs. rounded. I read this thread (Mix of rounded and non rounded observation) which discusses rounded measurements, and I thought it would be the same approach except instead of log Pr[z in (y - 0.5, y + 0.5)] it would be log Pr[z in (y , y + 0.5)] for the true measurement.

Here’s the “true” measurements model

data {
  int<lower=0> N;   // number of data items
  vector[N] x;      // predictor vector
  vector[N] y;      // outcome vector
}
parameters {
  real alpha;           // intercept
  real beta;            // coefficients for predictor
  real<lower=0> sigma;  // error scale
}
model {
  // Likelihood
  y ~ normal(x * beta + alpha, sigma);
  //target += normal_lpdf(y | mu, sigma);
  // Priors
  alpha ~ normal(0, 1);
  //target += normal_lpdf(alpha | 0, 1);
  beta ~ normal(0, 1);
  //target += normal_lpdf(beta | 0, 1);
  sigma ~ exponential(1);
  //target += exponential_lpdf(sigma | 1);
}

Here’s what I tried for the truncated measurements model

data {
  int<lower=0> N;   // number of data items
  vector[N] x;      // predictor vector
  vector[N] y;      // outcome vector
}
parameters {
  real alpha;           // intercept
  real beta;            // coefficients for predictor
  real<lower=0> sigma;  // error scale
}
model {
  vector[N] mu;
  mu = x * beta + alpha;
  // Likelihood
  target += log(Phi((y + 1.0 - mu) / sigma) 
    - Phi((y - mu) / sigma));
  // Priors
  //alpha ~ normal(0, 1);
  target += normal_lpdf(alpha | 0, 1);
  //beta ~ normal(0, 1);
  target += normal_lpdf(beta | 0, 1);
  //sigma ~ exponential(1);
  target += exponential_lpdf(sigma | 1);
}

and here’s the R code where I simulate some data to try this out:

library(rstan)
library(tidyverse)
library(here)

N <- 1000
alpha_true = -1
beta_true = 1
sigma_true = 1

df <- data.frame(x = rnorm(N, 0, 5)) %>%
  mutate(y_true = alpha_true + beta_true*x + rnorm(N, 0, sigma_true),
         y_round = round(y_true),
         y_trunc = trunc(y_true)
  )

# Fit true data model
mdl1 <- stan(file=here("rounding_example/y_true.stan"), 
             data=list(N = N,
                       y = df$y_true,
                       x = df$x), 
             model_name="mdl1",
             chains=4)

# Fit truncated data to "incorrect" model
mdl2 <- stan(file=here("rounding_example/y_true.stan"), 
             data=list(N = N,
                       y = df$y_trunc,
                       x = df$x), 
             model_name="mdl2",
             chains=4)

# Fit truncated data to "correct" model
mdl3 <- stan(file=here("rounding_example/y_trunc.stan"), 
             data=list(N = N,
                       y = df$y_trunc,
                       x = df$x), 
             model_name="mdl3",
             chains=4)

I expected mdl3's posterior estimate for beta would be closer to the true value, but it was unchanged; instead the posterior for the intercept was further from the true value.

I’ve mostly used the rethinking package so I’m still new to writing models directly in rstan. At this point, I’m not sure if my thought process for adapting the rounded measurement model was wrong or if I’ve implemented it wrong in stan.

mike-lawrence · July 25, 2021, 11:12pm

R’s trunc() function rounds values towards zero, so it is equivalent to ceiling() for negative values and floor() for positive values. I don’t think your model reflects this sign-dependence.

mkwong · July 26, 2021, 1:58am

That was the problem–thank you @mike-lawrence !

Here’s my updated truncated measurement model, and now the posterior distributions make sense.

data {
  int<lower=0> N;   // number of data items
  vector[N] x;      // predictor vector
  vector[N] y;      // outcome vector
}
parameters {
  real alpha;           // intercept
  real beta;            // coefficients for predictor
  real<lower=0> sigma;  // error scale
}
model {
  vector[N] mu;
  mu = x * beta + alpha;
  // Likelihood
  for(n in 1:N) {
    if(y[n] < 0.0) {
      target += log(Phi((y[n] - mu[n]) / sigma) 
          - Phi((y[n] - 1.0 - mu[n]) / sigma));     }
    else {
      target += log(Phi((y[n] + 1.0 - mu[n]) / sigma) 
          - Phi((y[n] - mu[n]) / sigma));    
    }
  }
  // Priors
  //alpha ~ normal(0, 1);
  target += normal_lpdf(alpha | 0, 1);
  //beta ~ normal(0, 1);
  target += normal_lpdf(beta | 0, 1);
  //sigma ~ exponential(1);
  target += exponential_lpdf(sigma | 1);
}

Topic		Replies	Views
Modelling dependent variable with some kind of rounding Modeling specification	7	786	September 6, 2018
Mix of rounded and non rounded observation Modeling	8	1240	October 2, 2017
2D mixture of truncated distributions Modeling	2	41	October 3, 2024
Another measurement error modeling question Modeling	13	581	July 5, 2023
Truncated data, log probability increment Modeling truncated	8	112	August 14, 2024

Modeling truncated vs. rounded observations

Related topics