How random variable type can change SBC?

Assuming “random variable as type” concept which Julia is proud of as introduced here is the same as rvar in posterior package, may I ask whether viewing prior as rvar would have any value or use-case? I think this change could bring great benefits as this allows testing with sample-object itself both for prior and outcome. To illustrate, I made two extensions to the current generator in SBC package.

From current generator as below from SBC package vignette, prior or outcome distribution can be substituted: lambda <- rgamma(n = 1, shape = 15, rate = 5)and y <- rpois(n = N, lambda = lambda) are replaced with their sample objects usingrvars type abstraction. Mechanisms are different as will be explained below, but their purpose, robust testing on potential outcomes, are similar.

Original

poisson_generator_single <- function(N){  # N is the number of data points we are generating
  lambda <- rgamma(n = 1, shape = 15, rate = 5)
  y <- rpois(n = N, lambda = lambda)
  list(
    parameters = list(
      lambda = lambda
    ),
    generated = list(
      N = N,
      y = y
    )
  )
}

1. Prior as rvar

This is related to what @bgoodri expressed as “Bayesian inference without prior PDFs because the famous probability distributions were not intended to be used as priors” here.

lambda <- posterior::rvar(rgamma(n = 30, shape = 15, rate = 5))

poisson_generator_rvar <- function(N, lambda_rvar){ 
  lambda <- sample(draws_of(lambda_rvar), 1)
  y <- rpois(n = N, lambda = lambda)
  list(
    parameters = list(
      lambda = lambda
    ),
    generated = list(
      N = N,
      y = y
    )
  )
}

Distributional prior

> rgamma(shape = 15, rate = 5)

becomes rvar<30>[1].

> rgamma(n = 30, shape = 15, rate = 5)
  [1] 2.419039 2.996315 3.157444 2.650880 2.841437 2.660757 3.197353 2.733794 3.602588 3.303370 4.300112
 [12] 2.759736 1.816918 2.357792 2.772481 2.894473 2.578578 2.801332 2.128145 1.575882 2.413564 3.229159
 [23] 2.814670 1.798068 2.380697 3.245296 1.898824 3.017477 3.454599 2.900755 

> posterior::rvar(rgamma(n = 30, shape = 15, rate = 5))
rvar<30>[1] mean ± sd:
[1] 2.8 ± 0.74 

2. Outcome as rvar

Here, with the absence of distribution or full samples on prior, rank-based simulation-based calibration is impossible. However, iteratively updated parameter samples (representation of empirically constructed importance sampling weights) can be useful.

For instance, given a BlackBox generator, whose outcome at each parameter value is only known, the following code can train a generator (poisson_generator and its prior lambda_rvar) with similar outcomes. I wonder whether this might be of value in simulation-based inference where we lack distributional information on prior or likelihood. Related to this recent paper on detecting miscalibration in SBI, I am curious about @paul.buerkner 's opinion on the possibility of connection.

# assume internal structre is unkown
BB_generator_batch <- function(N, lambda_rvar){
  y <- rfun(rpois)(n = N, lambda = lambda_rvar) * 1.1
  gen_rvars <- draws_rvars(N = N, y = y)
  SBC_datasets(
    parameters = as_draws_matrix(lambda_rvar),
    generated = draws_rvars_to_standata(gen_rvars)
  )
}

poisson_generator_batch <- function(N, lambda_rvar){
  y <- rfun(rpois)(n = N, lambda = lambda_rvar) 
  gen_rvars <- draws_rvars(N = N, y = y)
  SBC_datasets(
    parameters = as_draws_matrix(lambda_rvar),
    generated = draws_rvars_to_standata(gen_rvars)
  )
}

# init
lambda_rvar <- rvar(rlnorm(30, meanlog = 2, sdlog = 1))

for(k in 1:10){  
  datasets_sim <- poisson_generator_batch(N = 40, lambda_rvar = lambda_rvar)$generated
  datasets_bb<- BB_generator_batch(N = 40, lambda = lambda_rvar)$generated

  # distance for each parameter sample (length 30 vector)
  dist <- unlist(lapply(seq(1:30), function(m) mean(abs(datasets_sim[[m]]$y - datasets_bb[[m]]$y)))) 

  # resample according to exp^(-dist)
  weight = exp(-1 * dist) / sum(exp(-1 * dist))
  lambda_rvar <- rvar(sample(draws_of(lambda_rvar), 30, prob = weight, replace = TRUE))
}

P.S. If you have a specific use case, could you please kindly introduce it? This would help us (@martinmodrak, @Dashadower, and I) decide whether or not to extend random variable type support in SBC package. Thanks!

Tagging those who might be interested in rvar or SBC. @andrewgelman @betanalpha @avehtari @jonah @bgoodri

1 Like

Hi–I’m not sure, but here’s the paper with Jouni that introduced the idea back in 2007:

http://www.stat.columbia.edu/~gelman/research/published/postsim.pdf

This is how I thought of “probabilistic programming” until that term was used for Bayesian inference engines.