Sum-to-zero and sbc

spinkney · March 1, 2022, 4:19pm

Recently @Bob_Carpenter proposed a sum-to-zero data type new stan data type zero sum vector (and you can see the lively discussion about this type). I wanted to see how the simple sum-to-zero constraint compares to the SVD using the wonderful SBC package by @hyunji.moon and @martinmodrak.

TL;DR

The results are the same but SVD is faster and shows no divergences.

Alpha is estimated fairly well but mu and sigma are not. This is actually expected because we’re not replicating the simulation! The constraint will pull mu toward the K - 1 estimates and the Kth needs to pull the entire sum to 0. The variance of alpha is all contained in the K - 1 estimates, though we see that the reduction in sigma is not too large.

Simple sum-to-zero

Stan: simple_ctr.stan (514 Bytes)
R: simple.R (1.4 KB)

cache_dir <- "./_simple_SBC_cache"
options(mc.cores = parallel::detectCores())
library(SBC)
# Enabling parallel processing via future
library(future)
library(cmdstanr)
plan(multisession)

if(!dir.exists(cache_dir)) {
  dir.create(cache_dir)
}
options(SBC.min_chunk_size = 5)

ilogit <- function(u) 1 / (1 + exp(-u))

simple_generator_single <- function(N){  # N is the number of data points we are generating
  K <- 5
  mu <- 1.3
  sigma <- 2.3
  alpha <- rnorm(K, mu, sigma)
  y <- matrix(NA, K, N)
  for (k in 1:K)
    y[k, ] <- rbinom(N, 1, ilogit(alpha[k]))

  list(
    variables = list(
      mean_alpha = mu,
      sd_alpha = sigma,
      mu = mu,
      sigma = sigma,
      alpha = alpha
    ),
    generated = list(K = K, N = N, y = y)
  )
}

mod_simple <- cmdstan_model("simple_ctr.stan")
set.seed(54882235)
n_sims <- 200  # Number of SBC iterations to run

simple_generator <- SBC_generator_function(simple_generator_single, N = 200)
simple_dataset <- generate_datasets(
  simple_generator, 
  n_sims)

simple_backend <- SBC_backend_cmdstan_sample(
  mod_simple, iter_warmup = 500, iter_sampling = 500, chains = 2, parallel_chains = 2)
results <- compute_SBC(simple_dataset, simple_backend , 
                       cache_mode = "results", 
                       cache_location = file.path(cache_dir, "results"))

 # - 57 (28%) fits had at least one Rhat > 1.01. Largest Rhat was 1.037.
 # - 3 (2%) fits had divergent transitions. Maximum number of divergences was 17.
#  - 6 (3%) fits had some steps rejected. Maximum number of rejections was 2.
# Not all diagnostics are OK.

Using SVD

Stan: hier_ctr_svd.stan (582 Bytes)
R: hier_ctr_svd.R (1.7 KB)

Using the same DGP as in the original post but with fewer groups to speed up computation.

hier_ctr_svd_generator_single <- function(N){  # N is the number of data points we are generating
  K <- 5
  mu <- 1.3
  sigma <- 2.3
  alpha <- rnorm(K, mu, sigma)
  y <- matrix(NA, K, N)
  for (k in 1:K)
    y[k, ] <- rbinom(N, 1, ilogit(alpha[k]))
  Sigma <- contr.sum(K) %*% diag(K - 1) %*% t(contr.sum(K))
  s <- svd((Sigma + t(Sigma))/2) # Guarantee symmetry
  s$d <- abs(zapsmall(s$d))
  
  list(
    variables = list(
      mean_alpha = mu,
      sd_alpha = sigma,
      mu = mu,
      sigma = sigma,
      alpha = alpha
    ),
    generated = list(K = K, N = N, y = y,
                     U = s$u,
                     d_raw = sqrt(s$d))
  )
}

Results

Topic		Replies	Views
Sum to zero constraints and multi-level models - best practise/example code Modeling	9	232	May 2, 2025
Rewriting example models with SBC Developers simulation-based-calibration	8	1336	February 25, 2019
Check your models! SBC 0.1.1 is out! Publicity simulation-based-calibration	1	858	February 15, 2022
Zero sum vector and normal distribution Modeling	20	776	February 8, 2025
CmdStan & Stan 2.36 release candidate General	2	404	December 3, 2024

Sum-to-zero and sbc

TL;DR

Simple sum-to-zero

Using SVD

Related topics