Clarification regarding Jacobians - BYM2 model

Hello everyone,

I’ve been studying a particular Stan model found in this paper and the associated code example in this post. Specifically, I’m interested in the logit_rho parameter and its transformation to the constrained rho parameter, as described in the Stan model below.

functions {
real icar_normal_lpdf(vector phi, int N, int[] node1, int[] node2) {
return -0.5 * dot_self(phi[node1] - phi[node2])
+ normal_lpdf(sum(phi) | 0, 0.001 * N);
}
}
data {
int<lower=0> N;
int<lower=0> N_edges;
int<lower=1, upper=N> node1[N_edges];  // node1[i], node2[i] neighbors
int<lower=1, upper=N> node2[N_edges];  // node1[i] < node2[i]

int<lower=0> y[N];             // count outcomes
vector<lower=0>[N] E;          // exposure
int<lower=1> K;                // num covariates
matrix[N, K] x;                // design matrix
real<lower=0> scaling_factor;  // scales the variance of the spatial effects
}
transformed data {
vector[N] log_E = log(E);
}
parameters {
real beta0;            // intercept
vector[K] betas;       // covariates
real logit_rho;
vector[N] phi;         // spatial effects
vector[N] theta;       // heterogeneous effects
real<lower=0> sigma;   // overall standard deviation
}
transformed parameters {
real<lower=0, upper=1> rho = inv_logit(logit_rho);
vector[N] convolved_re = sqrt(rho / scaling_factor) * phi
+ sqrt(1 - rho) * theta;
}
model {
y ~ poisson_log(log_E + beta0 + x * betas + convolved_re * sigma);

beta0 ~ normal(0, 1);
betas ~ normal(0, 1);
logit_rho ~ normal(0, 1);
sigma ~ normal(0, 1);
theta ~ normal(0, 1);
phi ~ icar_normal_lpdf(N, node1, node2);
}
generated quantities {
vector[N] eta = log_E + beta0 + x * betas + convolved_re * sigma;
vector[N] mu = exp(eta);
int y_rep[N];
if (max(eta) > 20) {
// avoid overflow in poisson_log_rng
print("max eta too big: ", max(eta));
for (n in 1:N)
y_rep[n] = -1;
} else {
for (n in 1:N)
y_rep[n] = poisson_log_rng(eta[n]);
}
}


In the model, the logit_rho parameter is used to represent a value in unconstrained space. However, the constrained parameter rho , defined as rho = inv_logit(logit_rho) , plays a crucial role in scaling random effects and contributing to the likelihood function.

My question revolves around whether it’s necessary to adjust the target in the Stan model to account for the transformation from logit_rho to rho . In other words, should we incorporate the Jacobian term to the target to ensure that the parameter transformation is correctly handled?

In particular, let if we call inv_logit theta, we have \theta = {\rm logit}(\rho). Hence,
\rho = \frac{e^\theta}{1 + e^\theta} \implies \frac{{\partial} \rho}{{\partial} \theta} = \frac{e^\theta}{(1 + e^\theta)^2}.

This could be incorporated to the target as follows

target += logit_rho - 2 * log1p(exp(logit_rho));


Is my understanding and approach to incorporating the Jacobian term correct for this particular Stan model?

Thanks

Hi! No, you don’t need a Jacobian here as logit_rho is the parameter that is being sampled by Stan (i.e., it’s defined within the parameter block) and it’s the one for which you define the prior in the model block. rho is simply a function of the logit_rho.

The only time when you need a Jacobian is when

1. you have defined a parameter (saytheta) outside the parameter block
2. theta is a (non-linear) function of at least one parameter phi that is defined within the parameters block
3. there is a prior or likelihood statement (theta ~ somedistribution(...), or target +=f(theta), where theta satisfies condition 1 and 2.

In your case, let’s check the conditions:

1. yes, there is a parameter defined outside the parameter block
2. yes, the parameter is a non-linear function of another parameter defined inside the parameter block
3. no, there is no prior or likelihood statement where the left-hand side is the parameter theta mentioned in condition 1 and 2: the left-hand side of the statement logit_rho ~ normal(0, 1); is a parameter that is defined within the parameter block.

Condition 3 would only be satisfied if it said rho ~ somedistribution(...);, because now the left-hand side is a parameter that is defined outside the parameter block (condition 1) and that is a function of another parameter defined inside the parameter block (condition 2).

3 Likes

I am still confused (or not convinced), though.

I agree with your “conditions” for incorporating the Jacobian to the target.

In my opinion, the actual parameter in this case is rho and not logit_rho.
Because, in the actual model, the posterior is proportional to:

[Y \mid \boldsymbol{\beta}, S][S \mid \rho, \sigma][\rho, \sigma],

where S is a random effect (called convolved_re in the code), which is a weigthed average of two different types of random effects, with \rho being the “weight” of this weighted average.

What is done in the code is:

1. Sampling from logit_rho;
2. Transform it to rho using the inv_logit function;
3. Using rho as a parameter in the distribution of convolved_re (or S).

Thus, I think rho should have been in the “parameters” block. While in the “model” bloack, we should have something like this:

real theta = logit(rho);
theta ~ std_normal();
target += theta - 2 * log1p(exp(theta));


Do you get why I’m still confused?

Yes, this is fine and is indeed what your original model specification does. It is fine for rho to be a function of the parameterlogit_rho and for rho to be used in convolved_re.

This doesn’t make sense to me, though, and doesn’t reflect the three steps you highlight above. Think about it as a process by which the parameters and functions of parameters are generated, like you do in your list above. That way, you certaintly do not need a Jacobian.

[EDIT: and remember, any function of a set of samples from a posterior distribution still represents a posterior distribution]

Firstly, as far as I understand, the only difference between defining a transformed parameter under the transformed parameter block and the model block is that in the latter, the transformed values are not stored in the output.

In this model (which is not mine), there is no interest in the parameter inv_logit.

I had no issues with this model before reading the Section 24 of the Stan user guide (and some other posts like this and this ones).

I also apologize for any confusion in my question. To clarify, let me set up a toy example:

Consider a GLMM model defined as follows:

Y_{ij} \mid \cdot \sim {\rm Poisson} (E_i \lambda_{ij}),

where, i = 1, \ldots, N_j, j = 1, \ldots, M, and

\log(\lambda_{ij}) = \mathbf{x}^{\top}_i \boldsymbol{\beta} + s_{j}
where s_j is a random effect for the group j.

Are the following two models equivalent?

Model 1

data {
int<lower=0> N;
int<lower=0>
int<lower=0> y[N];         // count outcomes
int<lower=0> E[N];         // exposure
int<lower=1> K;            // num covariates
matrix[N, K] X;            // design matrix
int<lower = 0> M;          // num of groups
int<lower=1, upper = M> group_id; // group identifier
}
parameters {
vector[K] betas;  // reg coef
vector[M] s;      // random effects
real log_sigma;   // log standard deviation (for random effect)
}
transformed parameters {
real<lower=0> sigma = exp(log_sigma);
}
model {
vector[N] log_E = log(E);
for (i in 1:N) {
y ~ poisson_log(log_E + x * beta + s[grou_id[i]]);
}
s ~ normal(0, sigma);
log_sigma ~ std_normal();
beta ~ normal(0, 1);
}


Model 2

data {
int<lower=0> N;
int<lower=0>
int<lower=0> y[N];         // count outcomes
int<lower=0> E[N];         // exposure
int<lower=1> K;            // num covariates
matrix[N, K] X;            // design matrix
int<lower = 0> M;          // num of groups
int<lower=1, upper = M> group_id; // group identifier
}
parameters {
vector[K] betas;      // reg coef
vector[M] s;          // random effects
real<lower=0> sigma;  // log standard deviation (for random effect)
}
model {
vector[N] log_E = log(E);
for (i in 1:N) {
y ~ poisson_log(log_E + x * beta + s[grou_id[i]]);
}
real log_sigma = log(sigma);
s ~ normal(0, sigma);
log_sigma ~ std_normal();
beta ~ normal(0, 1);
}


In my understanding, these models are equivalent. However, based on your responses, it appears that the Jacobian should only be incorporated in the second case.

Could you please point out where my rationale is flawed?

Thank you for your assistance and the insightful discussion.

Let’s strip down even more of the complexity, and just ask about the following three models.

Model 1, which clearly requires no Jacobian adjustement, provided that the goal is to end up with normally distributed log_sigma

parameters{
real log_sigma;
}
transformed parameters{
real<lower=0> sigma = exp(log_sigma);
}
model{
log_sigma ~ std_normal();
}


Model 1.5, which is identical to model 1 (just pointing out that it doesn’t matter if you declare sigma in transformed parameters or locally in model.

parameters{
real log_sigma;
}
model{
real sigma = exp(log_sigma);
log_sigma ~ std_normal();
}


Model 2, which parameterizes in terms of sigma instead of in terms of log_sigma, requires a Jacobian adjustment.

parameters{
real<lower = 0> sigma;
}
transformed parameters{
real log_sigma = log(sigma);
}
model{
log_sigma ~ std_normal();
}


Following the language in the post you linked here, this final model’s parameterization starts out as flat over sigma, which is not flat over log(sigma). It then increments the target by normal_lpdf(log_sigma | 0, 1). If the starting point would have been flat over log_sigma, then this increment would yield a target density that is standard normal over log_sigma. But since the starting point is not flat over log_sigma, we now have defined a target density that is not standard normal over log_sigma, and requires a Jacobian adjustment.

Now let’s consider Model 1 again. Here parameterize such that our starting point is flat over log_sigma. We increment the target by normal_lpdf(log_sigma | 0, 1), and we have a final target that is standard normal. No Jacobian adjustment required.

Edit: I’ve read a bit more closely your earlier posts, and maybe I can add one further clarification. The only thing that controls whether rho or logit_rho is a parameter in this model is which one you declare in the parameters block. This model can be parameterized in terms of rho or in terms of logit_rho. Your choice of parameterization, coupled with which quantity you wish to put a prior on, determines whether or not a Jacobian adjustment is required. Whether rho or logit_rho is used as an argument to some downstream function (even if that function is a probability density function whose arguments we call “parameters”) has no bearing on which one of rho or logit_rho is a parameter in the sense of being use to parameterize the model configuration space. That choice depends solely on which one you choose to declare in the parameters block.

This is a pretty deep and fundamental point, and is one of those things that’s really hard to wrap one’s head around at first but eventually becomes intuitive in hindsight. For more I highly, highly recommend @betanalpha on probability theory: Probability Theory (For Scientists and Engineers)

2 Likes