It’s my impression that truncation with T[L,U]
doesn’t work with a vectorized sampling statement. That’s a bit unfortunate, since vectorized distribution functions are much faster. But would the following work? It relies on a similar approach as is mentioned in the Integrating out Censored Values section of the user’s guide.
data {
int<lower=0> N;
vector<lower=0>[N] y;
int<lower=0,upper=1> lccdf;
}
parameters {
real<lower=0> mu;
real<lower=0> sigma;
}
model {
mu ~ normal(0, 5);
sigma ~ gamma(2, 1);
if(lccdf) {
y ~ normal(mu, sigma);
target += -normal_lccdf(0 | mu, sigma) * N;
}
else{
for (i in 1:N) y[i] ~ normal(mu, sigma) T[0,];
}
}
I think that the flag lccdf
switches between two ways of generating samples from the same truncated normal distribution. One way relies on a for loop, and it samples relatively slowly. I think the other way implements the same truncation, but still gets a speed benefit from a vectorized distribution function. They seem to produce similar samples, though I was surprised that the user’s guide and reference manual only show the loop approach.
Here’s an R
script to run the model, using the cmdstanr
interface.
library(cmdstanr)
m <- cmdstan_model("truncation.stan")
rtnorm <- function(n, mean = 0, sd = 1, lower=-Inf, upper=Inf){
l <- pnorm(lower, mean = mean, sd = sd)
u <- pnorm(upper, mean = mean, sd = sd)
uni <- runif(n, l, u)
qnorm(uni, mean=mean, sd=sd)
}
N <- 1e4
mu <- 0.5
sigma <- 5
y <- rtnorm(N, mean=mu, sd=sigma, lower=0)
d <- list(N = N, y = y, lccdf=1)
fit <- m$sample(
data = d,
seed = 123,
chains = 3,
parallel_chains = 3)
bayesplot::mcmc_pairs(fit$draws())
d2 <- list(N = N, y = y, lccdf=0)
fit2 <- m$sample(
data = d2,
seed = 123,
chains = 3,
parallel_chains = 3)
bayesplot::mcmc_pairs(fit2$draws())