Data as a function of random variables: transformations vs change of variables

I’ve reviewed a few links related to Jacobians and functions of random variables, e.g.,

but alas failed to build my mental model of how to approach my problem.

I have data that are generated by two Gamma random variables:

  • I start with an initial halflife h \sim H = Gamma(a, b) with
  • hyperpriors on a \sim Normal(…) and b \sim B = Normal(…), and
  • a “boost” random variable b \sim Gamma(c, d) with known parameters c and d.

I collect n samples of data related to the sequence of random variables H, B H, B^2 H, …, B^{n-1} H.

In the case where my data x_i \sim Bernoulli(e^{-t_i / (B^{i-1} H)}), for known x_i and t_i, I can code this up in Stan just fine, and all is well:

data {
  int<lower=0,upper=1> x1;
  int<lower=0,upper=1> x2;
  int<lower=0,upper=1> x3;
  real t1;
  real t2;
  real t3;
}
parameters {
  real<lower=0> halflifeAlpha;
  real<lower=0> halflifeBeta;
  
  real<lower=0> initialHalflife;
  real<lower=0> boost;
}
transformed parameters {
  real<lower=0> hl1 = initialHalflife * boost;
  real<lower=0> hl2 = hl1 * boost;
}
model {
  halflifeAlpha ~ normal(10 * 0.25 + 1, 0.5);
  halflifeBeta ~ normal(10.0, 1.0);
  initialHalflife ~ gamma(halflifeAlpha, halflifeBeta);

  boost ~ gamma(10 * 1.4 + 1, 10.0);

  x1 ~ bernoulli(exp(-t1 / initialHalflife));
  x2 ~ bernoulli(exp(-t2 / hl1));
  x3 ~ bernoulli(exp(-t3 / hl2));
}

As mentioned above, this works fabulously: I get posteriors on H and B, hooray.

Click to see runnable Python code
from cmdstanpy import CmdStanModel
import json
import numpy as np

# write Stan code to disk
with open('question.stan', 'w') as fid:
  fid.write("""
data {
  int<lower=0,upper=1> x1;
  int<lower=0,upper=1> x2;
  int<lower=0,upper=1> x3;
  real t1;
  real t2;
  real t3;
}
parameters {
  real<lower=0> halflifeAlpha;
  real<lower=0> halflifeBeta;
  
  real<lower=0> initialHalflife;
  real<lower=0> boost;
}
transformed parameters {
  real<lower=0> hl1 = initialHalflife * boost;
  real<lower=0> hl2 = hl1 * boost;
}
model {
  halflifeAlpha ~ normal(10 * 0.25 + 1, 0.5);
  halflifeBeta ~ normal(10.0, 1.0);
  initialHalflife ~ gamma(halflifeAlpha, halflifeBeta);

  boost ~ gamma(10 * 1.4 + 1, 10.0);

  x1 ~ bernoulli(exp(-t1 / initialHalflife));
  x2 ~ bernoulli(exp(-t2 / hl1));
  x3 ~ bernoulli(exp(-t3 / hl2));
}
""")

# write data to disk
data = {
    "x1": 1,
    "x2": 1,
    "x3": 0,
    "t1": 0.8,
    "t2": 2.2,
    "t3": 2.9,
}

with open('modeldata.json', 'w') as fid:
  json.dump(data, fid)

model = CmdStanModel(stan_file="question.stan")
fit = model.sample(data="modeldata.json", chains=2, iter_sampling=1_000)
summarydf = fit.summary()
print(summarydf)

The issue is, instead of Bernoulli draws (x_i, t_i), suppose I have an experiment that yields h_i \sim B^{i-1} H? That is, I have

  • h_1 \sim H
  • h_2 \sim B H
  • h_3 \sim B^2 H.

I’d like to be able to say something like data {real h1; real h2; real h3;} and, in my model,

…
model {
  …
  h1 ~ initialHalflife; // 😕
  h2 ~ hl1; // 😕
  h3 ~ hl2; // 😕
}

but I know I can’t do this. Per the links mentioned above, I know I have to do a Jacobian adjustment to achieve my goal, e.g., 22.3 Changes of variables | Stan User’s Guide describe exactly how to compute the Jacobian, but my change of variables process above transforms just two random variables H and B into three or more random variables (from which my data is drawn), and I can’t evaluate the determinant of a rectangular matrix.

I’m familiar with finding the density of a function of two random variables, e.g., Z = X Y, which involves the Jacobian also, but I want to make sure I understand exactly how to update target in my Stan model for my case where I have a lot more data random variables than inputs.

I think I found an approximation–workaround to this problem via domain knowledge: remembering how I modeled Bernoulli trials x_i \sim Bernoulli(e^{-t_i / (B^{i-1} H) }) for my first set of experiments, I realized I can replace h_i \sim B^{i-1} H with an approximation e^{-t_i / h_i} \approx k_i / 100 and pretend that k_i \sim Binomial(100, e^{-t_i / (B^{i-1} H)}).

It’s really a grotesque hack but I don’t think it’ll ruin my analysis.

However, I am curious how to handle this properly via Jacobians!

Just a few pointers so you can maybe start off your investigations:

According to William Huber here, B^{i-1} will have a generalised Gamma distribution, with density f_{i} given as in the Wikipedia link. Let f_H be the (Gamma) density of H. Then you’ll have to compute

f_{H_i}(y) = \int_{0}^{\infty} f_i(x)f_H(y/x) \frac{1}{x} \operatorname\,dx = \int_{0}^{\infty} f_i(y/x)f_H(x) \frac{1}{x} \operatorname\,dx,

assuming that B^{i-1} and H are independent. Assuming you can compute at least one of the integrals above in closed-form, you have your density. If not, you might be lucky enough that an implementation using integrate_1d will be (i) numerically stable enough and (ii) not too slow so as to bring the whole thing to a screeching halt.

Godspeed.

EDIT: I just realised B is normally distributed rather than Gamma. If that is the case, you’ll have to write f_i as in here and carry on.

1 Like

The difficulty here is that you haven’t actually modeled the measurements

h_{i} \sim B^{i - 1} \, H.

In order to define a proper model you need to specify an observational model
\pi(y_{i} \mid B^{i - 1} \, H). Given that observational model everything is as straightforward to implement as before: instead of

\begin{align*} H &\sim \text{gamma}(a, b) \\ B ~ &\sim \text{gamma}(c, d) \\ p_{i} &= \exp( - t_{i} / B^{i - 1} H ) \\ x_{i} &\sim \text{Bernoulli}(p_{i}) \end{align*}

you’d have

\begin{align*} H &\sim \text{gamma}(a, b) \\ B ~ &\sim \text{gamma}(c, d) \\ h_{i} &\sim \pi(B^{i - 1} H). \end{align*}

No Jacobians necessary at all.

Now problems arise if you try to force a deterministic observational model,

\pi(h_{i} | B^{i - 1} H) = \delta(h_{i} - B^{i - 1} H),

where \delta is the Dirac delta function. In order to implement in Stan you’d have to explicitly integrate out each h_{i} which would result in awkward deterministic constraints on H and B that you probably won’t be able to implement analytically.

Also a tangential note – your model B^{i - 1} H implies that the same boost is applied to the original system multiple times. If different boosts from the same distribution are applied then you’ll need to model each boost with its own parameter.

1 Like