Hi all,
I’m fitting a truncated reciprocal distribution to simulated data. It’s a weird distributions, but it has been argued that it’s useful for reaction times and I anyways want to use it for a tutorial. It requires a truncation and change of variables (Jacobian). They are fairly simple, but maybe I have silly mistake there?
library(cmdstanr)
library(SBC)
library(extrDistr)
## Generate data from a truncated reciprocal normal dist
N <- 40
mu <- .002
sigma <- .0004
rt <- 1/rtnorm(N, mu, sigma, a = 0)
# code to fit the distribution
recnormal_code <- '
data {
int<lower = 1> N;
vector[N] RT;
}
parameters {
real mu_s;
real<lower = 0> sigma_s;
}
transformed parameters {
//aux pars so that it samples from pars that are not that small
real mu = mu_s / 1000;
real sigma = sigma_s / 1000;
}
model {
target += normal_lpdf(mu_s | 2, 2);
target += normal_lpdf(sigma_s | 4, 2)
- normal_lccdf(0 | 4, 2);
target += normal_lpdf(1 ./ RT | mu, sigma)
- N * normal_lccdf(0 | mu, sigma)
- sum(2 * log(RT));
}
'
The thing is that I recover true values quite ok, see below:
cmdstan_model <- cmdstanr::cmdstan_model(stan_file = write_stan_file(recnormal_code) )
fit_rec <- cmdstan_model$sample(
data = list(RT= rt, N= N),
seed = 123,
chains = 4,
parallel_chains = 4,
refresh = 500
)
fit_rec$summary(variables = c("mu_s", "sigma_s"))
## # A tibble: 2 × 10
## variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 mu_s 2.04 2.04 0.0564 0.0555 1.95 2.13 1.00 3124. 2653.
## 2 sigma_s 0.350 0.346 0.0422 0.0409 0.289 0.427 1.00 2998. 2475.
fit_rec$summary(variables = c("mu", "sigma"))
# A tibble: 2 × 10
## variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 mu 0.00204 0.00204 0.0000564 0.0000555 0.00195 0.00213 1.00 3124. 2653.
## 2 sigma 0.000350 0.000346 0.0000422 0.0000409 0.000289 0.000427 1.00 2998. 2475.
Here I use the new package SBC. And the results are quite bad. I tried to do SBC by hand and I also get horrible results:
# gen function:
recnormal_gen_single <- function(N){ # N is the number of data points we are generating
mu_s <- rnorm(1, 2, 2)
sigma_s <- rtnorm(1, 4, 2, a = 0)
mu <- mu_s / 1000
sigma <- sigma_s / 1000
RT <- rtnorm(N, mu, sigma, a = 0)
list(
variables = list(
mu_s = mu_s,
sigma_s = sigma_s),
generated = list(
N = N,
RT = RT)
)
}
set.seed(54882235)
n_sims <- 100 # Number of SBC iterations to run
recnormal_generator <- SBC_generator_function(recnormal_gen_single, N = 40)
recnormal_dataset <- generate_datasets(
recnormal_generator,
n_sims)
cmdstan_model <- cmdstanr::cmdstan_model(recnormal_rt)
recnormal_backend <- SBC_backend_cmdstan_sample(
cmdstan_model, iter_warmup = 1000, iter_sampling = 1000, chains = 2)
results <- compute_SBC(recnormal_dataset, recnormal_backend)
plot_rank_hist(results)
I attach the plot
Any ideas of what’s going on?
Best,
Bruno