Are Jacobian adjustments necessary when the target parameter is a difference between two parameters?

[Note on cross-posting: I asked the same question at CrossValidated and have received the suggestion that I should ask it here.]

I want to model the index called Delta P (e.g., p.144 of this paper), which is basically a difference between two proportions (i.e., \frac{n_1}{N_1} - \frac{n_2}{N_2}), as a function of a predictor. The input data should be the four count variables from which to calculate Delta P (i.e., n_1, N_1, n_2, N_2) and predictor values.

Below is my attempt to do it in Stan. When I run the code, I get a message about Jacobian adjustments since the left-hand side of a sampling statement is deltaP, which is calculated by subtracting one parameter from another (theta1 - theta2, where theta1 is the estimated value of \frac{n_1}{N_1} and theta2 is that of \frac{n_2}{N_2}).

data { 
  int<lower=0> N; // total number of observations
  int<lower=1> denom1[N]; // denominator of the first proportion
  int<lower=1> denom2[N]; // denominator of the second proportion
  int<lower=0> nom1[N]; // nominator of the first proportion
  int<lower=0> nom2[N]; // nominator of the second proportion
  real x[N]; // predictor variable
} 

parameters {
  real<lower=0, upper=1> theta1[N]; // the first proportion
  real<lower=0, upper=1> theta2[N]; // the second proportion
  real alpha; // intercept
  real beta; // slope parameter for x
  real<lower=0> sigma; // SD of the error term
} 

transformed parameters {
  real<lower=-1, upper=1> deltaP[N]; // Delta P
  for (i in 1:N) {
    deltaP[i] = theta1[i] - theta2[i];
  }
}

model {
  // priors
  theta1 ~ beta(1, 1);
  theta2 ~ beta(1, 1);
  alpha ~ normal(0, 2);
  beta ~ normal(0, 2);
  sigma ~ normal(0, 1) T[0, ];
  
  for (i in 1:N) {
    // estimating thetas based on denoms and noms
    nom1[i] ~ binomial(denom1[i], theta1[i]); 
    nom2[i] ~ binomial(denom2[i], theta2[i]);
    // deltaP is sampled from the truncated normal distribution whose mean is alpha + beta * x and the SD is sigma
    deltaP[i] ~ normal(alpha + beta * x[i], sigma) T[-1, 1];
  }
}

I run the Stan code above with the following R code.

library("rstan")

### Generate fake data
set.seed(100)
# sample size
N <- 100
# True parameter values
alpha <- -0.2
beta <- 0.5
sigma <- 0.1

# predictor values (x) and Delta P values
while (TRUE) {
  x <- runif(N, -1, 1)
  deltaP <- alpha + beta * x + rnorm(N, sd = sigma)
  if (all(deltaP <= 1) & all(deltaP >= -1)) break
}
# theta values
theta1 <- theta2 <- numeric(N)
for (i in 1:N) {
  if (deltaP[i] > 0) {
    theta1[i] <- runif(1, deltaP[i], 1)
    theta2[i] <- theta1[i] - deltaP[i]
  } else {
    theta2[i] <- runif(1, abs(deltaP[i]), 1)
    theta1[i] <- theta2[i] + deltaP[i]
  }
}

# denoms and noms
denom1 <- sample(N, replace = TRUE)
denom2 <- sample(N, replace = TRUE)
nom1 <- rbinom(N, denom1, theta1)
nom2 <- rbinom(N, denom2, theta2)

### fit the model
fit <- stan(file = 'xxx.stan', 
            data = list(
              N = N,
              denom1 = denom1,
              denom2 = denom2,
              nom1 = nom1,
              nom2 = nom2,
              x = x
            ))

This runs, but I also get the following message:

DIAGNOSTIC(S) FROM PARSER:
Info:
Left-hand side of sampling statement (~) may contain a non-linear transform of a parameter or local variable.
If it does, you need to include a target += statement with the log absolute determinant of the Jacobian of the transform.
Left-hand-side of sampling statement:
    deltaP[i] ~ normal(...)

I only have a vague understanding of Jacobian, but I believe it is necessary when parameters are transformed nonlinearly as it alters the shape of variable distribution. What I am not sure of is whether the case above (deltaP = theta1 - theta2) equates with nonlinear transformation, and if it does, what kind of Jacobian adjustments are necessary (or if there are any other ways to circumvent the issue).

When I repeated the above code 1,000 times with different seeds and examined the distribution of the mean of the posterior distributions in the three focal parameters (i.e., alpha, beta, sigma), 70.5% of alpha, 20.1% of beta, and 37.4% of sigma were above the true value (see the figure below), which makes me suspect they may be biased and the bias could be due to the failure to include Jacobian adjustments.

Well, the Jacobian matrix is \mathbf{J} = \begin{bmatrix}1 & 0 \\ 1 & -1\\ \end{bmatrix}, whose deterimant is -1, whose absolute value is 1, whose logarithm is 0. So, whether you add zero to target does not affect its value.

3 Likes

Thank you for your prompt response. It would be great news if I do not have to make any Jacobian adjustments in my case. But I wonder what then caused what appears to be a bias in the above simulation. I have just run the above model 100 times, and the 75% CIs of the beta parameter included the true value (0.50) only in 43% of the cases, while 50% underestimated it.

Library(“tidyverse”)

lower <- upper <- numeric(100)
for (j in 1:100) {
  set.seed(j)
  
  # (above R code without “set.seed(100)”)

  sm <- summary(fit)$summary %>% 
    as.data.frame %>%
    rownames_to_column
  lower[j] <- sm[202, 6] # lower limit of the 75% CI of beta
  upper[j] <- sm[202, 8] # upper limit of the 75% CI of beta
}

res <- tibble(lower, upper)
# proportion of the cases where the upper limit is smaller than the true value
mean(res$upper < 0.5) # 0.50 
# proportion of the cases where the lower limit is larger than the true value
mean(res$lower > 0.5) # 0.07
# proportion of the cases where the true value falls within the 75% CIs
mean(res$lower < 0.5 & res $upper > 0.5) # 0.43

Do you think this what appears like a systematic bias is due to chance, my coding error somewhere (though I have triple-checked my R code…), or some other reasons outside of Bayesian modeling with Stan?

A quick warning – Jacobians are defined for only 1-1 functions. There is not a well-defined Jacobian for a 2-1 map that takes in two parameters and returns a single parameter. Formally you have to map \theta_{1} and \theta_{2} to two parameters, say \delta = \theta_{1} - \theta_{2} and \sigma = \theta_{1} + \theta_{2}, compute the Jacobian (which here would be a constant and hence ignorable), and then marginalize out the second variable \sigma.

Regarding the question of bias – in general there is no guarantee that a Bayesian posterior, or any posterior expectation value such as a posterior mean, will be close to the true value. The figures you showed don’t seem to indicate any particularly pathological behavior.

4 Likes

Thank you for your reply. I indeed could not find a case where a parameter used in the left-hand side of a sampling statement is defined in terms of multiple parameters in the transformed parameters block, and was wondering if it is possible to define Jacobians in such a case to start with (even if I needed to).

in general there is no guarantee that a Bayesian posterior, or any posterior expectation value such as a posterior mean, will be close to the true value

I am not sure if I have understood what you mean here correctly. How could I then confirm (or come to be reasonably confident) that a predictor and Delta P (in the above case) are related at the population level?

Also, if I manually compute Delta P and run simple regression models (ignoring differences in the denominator of Delta P across observations), the estimates get much closer to the true value.

d <- tibble(x)
d$deltaP <- nom1/denom1 - nom2/denom2
summary(brm(deltaP ~ x, data = d), prob = 0.75) # 75% CI = [0.45, 0.53]

Am I right in understanding that inferences regarding the relationship between the predictor and Delta P based on my Stan code in the original post are at least as credible as those made based on the R code above, even though the code above yields the estimates that are closer to the true value? I thought parameter recovery experiments like this are a typical means to confirm that a statistical model is working as intended.

Just because they are common doesn’t mean that they are mathematical well-posed!

There are multiple fail points when implementing a Bayesian inference in practice even if the model is known to capture the true data generating process. Namely we have to worry about

    1. Did I implement the model correctly?
    1. Is my computation accurate?
    1. Is my domain expertise and the information in the data sufficiently informative?

Each of these concerns are sequential – if we’re not sure about (1) then we can’t test (2) or (3) by themselves and if we’re not sure about (1) or (2) then we can’t test (3) by itself. “Parameter recovery” experiments test (3), but if (1) or (2) fail then we can’t trust that the computed posterior distribution accurately quantifies the exact posterior distribution and hence can’t tell if poor parameter recover is due to the inaccurate posterior or fundamental limitations of the model.

Here you are worried about (1) and are testing (3). Failures in (3) could be due to errors in how the model is implemented in a Stan program or it could be due to limitations in the model itself. To separate the possibilities you have to check the possible failures mechanisms one at a time. Because you’re already generating many simulations I would recommend starting with simulation based calibration.

For more details see Section 3.3 of Probabilistic Modeling and Statistical Inference and Sections 1.2 an 1.3 of Towards A Principled Bayesian Workflow.

1 Like

Thank you for the detailed response. I feel I now have a much more fine-grained view of Bayesian statistical modeling. Thank you also for pointers. I have started reading on simulation-based calibration. While I’m still trying to digest it, it looks useful.