# How random variable type can change SBC?

Assuming “random variable as type” concept which Julia is proud of as introduced here is the same as rvar in posterior package, may I ask whether viewing prior as `rvar` would have any value or use-case? I think this change could bring great benefits as this allows testing with sample-object itself both for prior and outcome. To illustrate, I made two extensions to the current generator in SBC package.

From current generator as below from SBC package vignette, prior or outcome distribution can be substituted: `lambda <- rgamma(n = 1, shape = 15, rate = 5)`and ` y <- rpois(n = N, lambda = lambda)` are replaced with their sample objects using`rvars` type abstraction. Mechanisms are different as will be explained below, but their purpose, robust testing on potential outcomes, are similar.

## Original

``````poisson_generator_single <- function(N){  # N is the number of data points we are generating
lambda <- rgamma(n = 1, shape = 15, rate = 5)
y <- rpois(n = N, lambda = lambda)
list(
parameters = list(
lambda = lambda
),
generated = list(
N = N,
y = y
)
)
}
``````

## 1. Prior as `rvar`

This is related to what @bgoodri expressed as “Bayesian inference without prior PDFs because the famous probability distributions were not intended to be used as priors” here.

``````lambda <- posterior::rvar(rgamma(n = 30, shape = 15, rate = 5))

poisson_generator_rvar <- function(N, lambda_rvar){
lambda <- sample(draws_of(lambda_rvar), 1)
y <- rpois(n = N, lambda = lambda)
list(
parameters = list(
lambda = lambda
),
generated = list(
N = N,
y = y
)
)
}
``````

Distributional prior

``````> rgamma(shape = 15, rate = 5)
``````

becomes `rvar<30>`.

``````> rgamma(n = 30, shape = 15, rate = 5)
 2.419039 2.996315 3.157444 2.650880 2.841437 2.660757 3.197353 2.733794 3.602588 3.303370 4.300112
 2.759736 1.816918 2.357792 2.772481 2.894473 2.578578 2.801332 2.128145 1.575882 2.413564 3.229159
 2.814670 1.798068 2.380697 3.245296 1.898824 3.017477 3.454599 2.900755

> posterior::rvar(rgamma(n = 30, shape = 15, rate = 5))
rvar<30> mean ± sd:
 2.8 ± 0.74
``````

## 2. Outcome as `rvar`

Here, with the absence of distribution or full samples on prior, rank-based simulation-based calibration is impossible. However, iteratively updated parameter samples (representation of empirically constructed importance sampling weights) can be useful.

For instance, given a `BlackBox` generator, whose outcome at each parameter value is only known, the following code can train a generator (`poisson_generator` and its prior `lambda_rvar`) with similar outcomes. I wonder whether this might be of value in simulation-based inference where we lack distributional information on prior or likelihood. Related to this recent paper on detecting miscalibration in SBI, I am curious about @paul.buerkner 's opinion on the possibility of connection.

``````# assume internal structre is unkown
BB_generator_batch <- function(N, lambda_rvar){
y <- rfun(rpois)(n = N, lambda = lambda_rvar) * 1.1
gen_rvars <- draws_rvars(N = N, y = y)
SBC_datasets(
parameters = as_draws_matrix(lambda_rvar),
generated = draws_rvars_to_standata(gen_rvars)
)
}

poisson_generator_batch <- function(N, lambda_rvar){
y <- rfun(rpois)(n = N, lambda = lambda_rvar)
gen_rvars <- draws_rvars(N = N, y = y)
SBC_datasets(
parameters = as_draws_matrix(lambda_rvar),
generated = draws_rvars_to_standata(gen_rvars)
)
}

# init
lambda_rvar <- rvar(rlnorm(30, meanlog = 2, sdlog = 1))

for(k in 1:10){
datasets_sim <- poisson_generator_batch(N = 40, lambda_rvar = lambda_rvar)\$generated
datasets_bb<- BB_generator_batch(N = 40, lambda = lambda_rvar)\$generated

# distance for each parameter sample (length 30 vector)
dist <- unlist(lapply(seq(1:30), function(m) mean(abs(datasets_sim[[m]]\$y - datasets_bb[[m]]\$y))))

# resample according to exp^(-dist)
weight = exp(-1 * dist) / sum(exp(-1 * dist))
lambda_rvar <- rvar(sample(draws_of(lambda_rvar), 30, prob = weight, replace = TRUE))
}
``````

P.S. If you have a specific use case, could you please kindly introduce it? This would help us (@martinmodrak, @Dashadower, and I) decide whether or not to extend random variable type support in SBC package. Thanks!

Tagging those who might be interested in rvar or SBC. @andrewgelman @betanalpha @avehtari @jonah @bgoodri

1 Like

Hi–I’m not sure, but here’s the paper with Jouni that introduced the idea back in 2007:

http://www.stat.columbia.edu/~gelman/research/published/postsim.pdf

This is how I thought of “probabilistic programming” until that term was used for Bayesian inference engines.