Chains not mixing for hierarchal case

I am having trouble getting the chains to mix properly in this hierarchal model. The model can be described mathematically as:

\lambda_k \sim Normal(\mu, \sigma)
\delta_k \sim Normal(\nu, \rho)
\tau_{k,i} \sim Uniform(0,T_k)
\alpha_{k,i} \sim Poisson(m_i N_{k,i} \lambda_k \tau_{k,i})
\beta_{k,i} \sim Poisson(n_i N_{k,i} \lambda_k (T_k - \delta_k - \tau_{k,i}))

With appropriate priors on \mu, \sigma, \nu, \rho.

Briefly, I am trying to estimate the age (\tau) of a patient when copy number aberrations - e.g. going from 2 (m_i) chromosomes to 4 (n_i) - occur based on the number of mutations that occurred before (\alpha_{k,i}) and after (\beta_{k,i}) the CNA. I assume that, for each patient, mutations occur at a constant rate per base pair (\lambda_k) on a region of size N_{k,i}. One complication is that we can only detect mutations up until a given time in the past, which I have characterised by including a patient specific lag-time \delta. If I use a non-hierarchal model (i.e \lambda_k and \delta_k are the same for each patient) and apply it to simulated data, than the chains converge and I can recover the input parameters. However, introducing a hierarchal model causes the chains to fail to converge. Have I misspecified my model, or do I need to use more informative priors?

data {
int<lower=1> N;   // # of chromosomal events
int<lower=0> K;   // # of groups
int<lower=0> alpha[K, N];
int<lower=0> beta[K, N];
vector<lower=0>[K] age;
vector<lower=0>[N] region_size[K];
vector<lower=1>[N] num_strands_before[K];
vector<lower=1>[N] num_strands_after[K];
}

parameters {
vector<lower=0, upper=1>[K] t_offset_raw;
vector<lower=0, upper=1>[N] t_raw[K];
vector<lower=0>[K] rate;
real<lower=0> rate_mu;
real<lower=0> rate_sigma;
real<lower=0> t_offset_mu;
real<lower=0> t_offset_sigma;
}

transformed parameters{
vector[K] t_offset;
vector[N] t[K];
vector[N] lam1[K];
vector[N] lam2[K];

t_offset = age .* t_offset_raw;

for (k in 1:K){

t[k] = (age[k] - t_offset[k]) * t_raw[k];
lam1[k] = rate[k] * num_strands_before[k] .* region_size[k] .* t[k];
lam2[k] = rate[k] * num_strands_before[k] .* region_size[k] .* (age[k] - t[k] - t_offset[k]);

}

}

model {
t_offset_mu ~ normal(5, 3);
t_offset_sigma ~ normal(2, 3);
rate_mu ~ normal(0, 0.1);
rate_sigma ~ normal(0, 0.05);

for (k in 1:K){
alpha[k, :] ~ poisson(lam1[k]);
beta[k, :] ~  poisson(lam2[k]);
}
}

vector<lower=0, upper=1>[K] t_offset_raw;
vector<lower=0, upper=1>[N] t_raw[K];
vector<lower=0>[K] rate;


Are t_offset_raw and t_raw non-centered parameterizations? If so they need the prior normal(0, 1). Check https://mc-stan.org/users/documentation/case-studies/divergences_and_bias.html for the hows and whys of non-centering.

Put a prior on rate too. It can make things difficult on the sampler to not have priors.

Thanks for your help. t_offset_raw and t_raw are meant to account for the variable upper bound, so I didn’t think they needed normal(0, 1) priors? I’d missed including the rate[k] ~ normal(rate_mu, rate_sigma) sampling statement, using the centred parameterisation below the chains seem to converge, but there are a number of divergences which I think are biasing the chains, so I’d like to convert it to a non-centred parameterisation. The problem is I’m unsure how to include both the variable upper bounds with the non-centred parameterisation.

model {
t_offset_mu ~ normal(5, 3);
t_offset_sigma ~ normal(2, 1);
t_offset_raw ~ normal(0, 1);
rate_mu ~ normal(0, 0.1);
rate_sigma ~ normal(0, 0.05);

for (k in 1:K){
rate[k] ~ normal(rate_mu, rate_sigma)T[0,];
t_offset[k] ~ normal(t_offset_mu, t_offset_sigma)T[0,age[k]];

alpha[k, :] ~ poisson(precalculate1[k]);
beta[k, :] ~  poisson(precalculate2[k]);
}

}


I thought the parameterisation below would work, the divergences go away but I get lots of error messages and the posterior doesn’t recover the true values of t_offset_mu or t_offset_sigma.

Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Exception: poisson_lpmf: Rate parameter[1] is nan, but must not be nan!  (in 'unknown file name' at line 57)

If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.

data {
int<lower=1> N;   // # of chromosomal events
int<lower=0> K;   // # of groups
int<lower=0> alpha[K, N];
int<lower=0> beta[K, N];
vector<lower=0>[K] age;
vector<lower=0>[N] region_size[K];
vector<lower=1>[N] num_strands_before[K];
vector<lower=1>[N] num_strands_after[K];
}

parameters {
vector<lower=0, upper=1>[N] t_raw[K];
real<lower=0> rate_mu;
real<lower=0> rate_sigma;
vector<lower=-rate_mu/rate_sigma>[K] rate_raw;
real<lower=0> t_offset_mu;
real<lower=0> t_offset_sigma;
vector<lower=0,upper=1>[K] theta;
}

transformed parameters{
vector[K] t_offset_raw;
vector[K] t_offset;
vector[N] t[K];
vector[N] precalculate1[K];
vector[N] precalculate2[K];
vector<lower=0>[K] rate;

t_offset_raw = -(t_offset_mu/t_offset_sigma) + (age /t_offset_sigma) .* theta;
rate = rate_mu + rate_sigma * rate_raw;
t_offset = t_offset_mu + t_offset_sigma * t_offset_raw;

for (k in 1:K){

t[k] = (age[k] - t_offset[k]) * t_raw[k];
precalculate1[k] = rate[k] * num_strands_before[k] .* region_size[k] .* t[k];
precalculate2[k] = rate[k] * num_strands_after[k] .* region_size[k] .* (age[k] - t[k] - t_offset[k]);

}

}

model {
t_offset_mu ~ normal(5, 3);
t_offset_sigma ~ normal(2, 1);
rate_mu ~ normal(0, 0.1);
rate_sigma ~ normal(0, 0.05);

for (k in 1:K){

rate_raw[k] ~ normal(0, 1)T[-rate_mu/rate_sigma,];
t_offset_raw[k] ~ normal(0, 1)T[-t_offset_mu/t_offset_sigma, (age[k]-t_offset_mu)/t_offset_sigma];

alpha[k, :] ~ poisson(precalculate1[k]);
beta[k, :] ~  poisson(precalculate2[k]);
}
}


It might be worth trying to use link functions to get the constraints you want. I agree, figuring this out with the centering vs. non-centering doesn’t seem easy.

So something like:

\mu_\alpha \sim N(0, 1) \\ \sigma_\alpha \sim N(0, 1) \\ \alpha_i \sim N(\mu_\alpha, \sigma_\alpha) \\ c_i = e^{\alpha_i}

So c_i would be constrained to be greater than zero here.

Harder to interpret the priors here I suppose.

I’m not entirely sure what you mean by using link functions, how would c_i fit into my model? I thought the problems with convergence could be related to the fact that I’ve transformed t_offset_raw but not done the appropriate Jacobian adjustment.

t_offset_raw=-(t_offset_mu/t_offset_sigma) + (age - t_offset_mu)/t_offset_sigma .* theta;


I believe the correct jacobian adjustment is

target += log(fabs(age[k]-t_offset_mu)) - log(t_offset_sigma);


However, doing this still doesn’t allow the chains to converge and I now get divergences in the posterior.

You can do a link function when you want a constrained parameter of some sort.

So if you have a parameter that needs to be in the range [a, b], you can take an unconstrained variable x and just do (b - a) \text{logit}^{-1}(x) + a.

You’d put your priors or whatnot on the unconstrained variables. That screws up interpretability of the priors, but it’s easy.

The way to test the Jacobian stuff is doing what you want is to make a one parameter model with the transformation you’re testing and compare that against like a rejection sampler or something you can brute force.

1 Like

Okay, so based on your suggestions the model should be something like:

transformed parameters{
vector[K] t_offset;
vector[N] t[K];
vector[K] rate;

t_offset = age .* inv_logit(t_offset_mu + t_offset_sigma * t_offset_raw);
rate = exp(rate_mu + rate_sigma * rate_raw);

for (k in 1:K){
t[k] = (age[k] - t_offset[k]) * t_raw[k];
}

}

model {
rate_mu ~ normal(-3, 0.4);
rate_sigma ~ normal(0, 0.5);
rate_raw ~ normal(0, 1);

t_offset_mu ~ normal(-3, 1);
t_offset_sigma ~ normal(0, 1);
t_offset_raw ~ normal(0, 1);

for (k in 1:K){
alpha[k, :] ~ poisson(rate[k] * num_strands_before[k] .* region_size[k] .* t[k]);
beta[k, :] ~  poisson(rate[k] * num_strands_after[k] .* region_size[k] .* (age[k] - t[k] - t_offset[k]));
}
}


The convergence when run on my simulation appears much better, but I’m getting quite a few divergences at adapt_delta=0.95, is this something to be concerned about? I’m also slightly concerned that by having a prior on t_offset_raw and then scaling t_offset_mu + t_offset_sigma * t_offset_raw by the patient’s age, each patient has a different prior on the lag-time. Ideally, I’d like the prior on each patient’s lag time to be the same (in absolute terms), do you think I can scale the parameters in some way to make this make sense?

Yeah, unfortunately. Divergences mean there are parts of the posterior the sampler isn’t getting too.

It’s entirely possible the fit looks good and the posterior predictives and everything look great for a model with divergences. The issue is that Stan is generating samples from a distribution that isn’t the one mathematically specified by the model. (Edit: So you’d have trouble communicating these results in a paper or something cause the results have some sort of weird sampler bias that isn’t part of the mathematical model)

Before we give up on this formulation though, could you try the log parameterization of the Poisson: 13.6 Poisson Distribution, Log Parameterization | Stan Functions Reference ?

It’s numerically more stable than evaluating the exps. It’s easy for things like rate = exp(rate_mu + rate_sigma * rate_raw) to either go to zero or infinity.

1 Like

Thanks for the suggestion of using poisson_log, it allows the chains to reach convergence without any divergences. One final thing is that this parameterisation puts the priors on the relative time of the offset scaled by the patient age, but it would be more reasonable to put the same prior in absolute terms on \delta_k. Is this possible with the current parameterisation without breaking the convergence?

Yeah it puts the priors in a weird kinda not-so-easily-interpreted place.

Dunno without trying. So maybe start inching back in this direction? Maybe it was the poisson_log thing that was giving you trouble before.

If you want to put the prior on the transformed space instead of the untransformed space, you’ll need to include the appropriate adjustment. The adjustments for a bunch of common constraints are documented here: 10.4 Lower and Upper Bounded Scalar | Stan Reference Manual (this is how the constraints are done, so we’re really just looping back to what you had originally haha, but this time we’re coming from a model that works).