I’ve reviewed a few links related to Jacobians and functions of random variables, e.g.,
- Simple model Parse Errors - #10 by Bob_Carpenter (2018)
- 10.1 Changes of variables | Stan Reference Manual
- 22.3 Changes of variables | Stan User’s Guide
but alas failed to build my mental model of how to approach my problem.
I have data that are generated by two Gamma random variables:
- I start with an initial halflife h \sim H = Gamma(a, b) with
- hyperpriors on a \sim Normal(…) and b \sim B = Normal(…), and
- a “boost” random variable b \sim Gamma(c, d) with known parameters c and d.
I collect n samples of data related to the sequence of random variables H, B H, B^2 H, …, B^{n-1} H.
In the case where my data x_i \sim Bernoulli(e^{-t_i / (B^{i-1} H)}), for known x_i and t_i, I can code this up in Stan just fine, and all is well:
data {
int<lower=0,upper=1> x1;
int<lower=0,upper=1> x2;
int<lower=0,upper=1> x3;
real t1;
real t2;
real t3;
}
parameters {
real<lower=0> halflifeAlpha;
real<lower=0> halflifeBeta;
real<lower=0> initialHalflife;
real<lower=0> boost;
}
transformed parameters {
real<lower=0> hl1 = initialHalflife * boost;
real<lower=0> hl2 = hl1 * boost;
}
model {
halflifeAlpha ~ normal(10 * 0.25 + 1, 0.5);
halflifeBeta ~ normal(10.0, 1.0);
initialHalflife ~ gamma(halflifeAlpha, halflifeBeta);
boost ~ gamma(10 * 1.4 + 1, 10.0);
x1 ~ bernoulli(exp(-t1 / initialHalflife));
x2 ~ bernoulli(exp(-t2 / hl1));
x3 ~ bernoulli(exp(-t3 / hl2));
}
As mentioned above, this works fabulously: I get posteriors on H and B, hooray.
Click to see runnable Python code
from cmdstanpy import CmdStanModel
import json
import numpy as np
# write Stan code to disk
with open('question.stan', 'w') as fid:
fid.write("""
data {
int<lower=0,upper=1> x1;
int<lower=0,upper=1> x2;
int<lower=0,upper=1> x3;
real t1;
real t2;
real t3;
}
parameters {
real<lower=0> halflifeAlpha;
real<lower=0> halflifeBeta;
real<lower=0> initialHalflife;
real<lower=0> boost;
}
transformed parameters {
real<lower=0> hl1 = initialHalflife * boost;
real<lower=0> hl2 = hl1 * boost;
}
model {
halflifeAlpha ~ normal(10 * 0.25 + 1, 0.5);
halflifeBeta ~ normal(10.0, 1.0);
initialHalflife ~ gamma(halflifeAlpha, halflifeBeta);
boost ~ gamma(10 * 1.4 + 1, 10.0);
x1 ~ bernoulli(exp(-t1 / initialHalflife));
x2 ~ bernoulli(exp(-t2 / hl1));
x3 ~ bernoulli(exp(-t3 / hl2));
}
""")
# write data to disk
data = {
"x1": 1,
"x2": 1,
"x3": 0,
"t1": 0.8,
"t2": 2.2,
"t3": 2.9,
}
with open('modeldata.json', 'w') as fid:
json.dump(data, fid)
model = CmdStanModel(stan_file="question.stan")
fit = model.sample(data="modeldata.json", chains=2, iter_sampling=1_000)
summarydf = fit.summary()
print(summarydf)
The issue is, instead of Bernoulli draws (x_i, t_i), suppose I have an experiment that yields h_i \sim B^{i-1} H? That is, I have
- h_1 \sim H
- h_2 \sim B H
- h_3 \sim B^2 H.
I’d like to be able to say something like data {real h1; real h2; real h3;}
and, in my model
,
…
model {
…
h1 ~ initialHalflife; // 😕
h2 ~ hl1; // 😕
h3 ~ hl2; // 😕
}
but I know I can’t do this. Per the links mentioned above, I know I have to do a Jacobian adjustment to achieve my goal, e.g., 22.3 Changes of variables | Stan User’s Guide describe exactly how to compute the Jacobian, but my change of variables process above transforms just two random variables H and B into three or more random variables (from which my data is drawn), and I can’t evaluate the determinant of a rectangular matrix.
I’m familiar with finding the density of a function of two random variables, e.g., Z = X Y, which involves the Jacobian also, but I want to make sure I understand exactly how to update target
in my Stan model
for my case where I have a lot more data random variables than inputs.