Random effects on a constrained parameter and divergent transitions

For some reasons, I am trying to fit a model with a Gaussian-shape optimum on binary data. This can be done quite easily and nicely when no random effects are involved, but I am now trying to include random effects on the parameter W_{max} that constrains the latent model between 0 and 1. This, of course, is difficult as the random effects are not individually constrained, only their sum must be constrained between 0 and 1. Worse, the aim would to include more than one effect.

But let’s focus on a single random effect for now. I figured it would be more easy on a log scale as this allow for only one boundary (namely 0) rather than two (0 and 1). The following model runs, but results in a lot of divergent transitions (roughly 25% with adapt_delta at 0.80, needless to say that increasing this value doesn’t result in total remove of such transitions):

functions { 
} 
data {
    int<lower=1> Nobs;                          // Total number of observations
    int<lower=0> W[Nobs];                    // Response variable
    vector[Nobs] z;                                 // Covariate
    
    int<lower=1> Nind;                          // Total number of individuals
    int<lower=1,upper=Nind> J_ind[Nobs];        // Grouping indices for ID
}
parameters {
    real<lower=0> omega;            // Peak Width
    real theta;                                 // Values of the optimum parameter
    real<upper=0> log_Wmax;             // Values of the max fitness parameter
    
    real<lower=0> ind_sigma;        // Sigma of individuals effects
    vector[Nind] ind_extend;        // Individual effects
}
transformed parameters {
    vector<upper=0>[Nind] log_w_ind;         // Effect of individual on log_Wmax
    vector<lower=0,upper=1>[Nobs] prob;
    
    // Wmax with individual effects
    log_w_ind = log_Wmax + ind_extend * ind_sigma;
    
    for (n in 1:Nobs) {
        prob[n] = exp(log_w_ind[J_ind[n]] -((z[n] - theta) .* (z[n] - theta)) ./ (2 * omega * omega));
    }
    
} 
model {
    // Likelihood
    W ~ bernoulli(prob);
    
    // Priors
    omega ~ gamma(3.36,0.78);
    theta ~ normal(0,1000);
    log_Wmax  ~ normal(0,1);
    ind_extend ~ normal(0,1);
    ind_sigma ~ normal(0,1);
}
generated quantities { 
}

Any guidance would be appreciated here. Am I being unreasonable in wanting to fit this model? Is there any trick I could use to make the boundary issue less painful?

FYI, a (unconstrained) logit-link model on the same data runs nicely.

Thanks!

First thing to try is the Bernoulli logit parameterization: https://mc-stan.org/docs/2_20/functions-reference/bernoulli-logit-distribution.html

Doing the exps in the code leads to overflows a lot of the time. Also is this guaranteed to be less than one?:

prob[n] = exp(log_w_ind[J_ind[n]] -((z[n] - theta) .* (z[n] - theta)) ./ (2 * omega * omega));

If not that could cause problems.

The thing is that I’m trying to avoid the logit for this particular model. For some reasons, I’m trying to fit the Gaussian curve directly (which is not possible with a logit, only approximatively).

prob will indeed be less that one, as log_w_ind is less than zero and we remove something positive (a squared value) from it, so the content of the exponential should be less than 0, given the constraints provided.

Oh okay. Probly worth trying to increment the log density manually to avoid the explicit exp then. Not sure if that’s the problem here but this sorta thing can definitely lead to divergences.

Thanks, I’ll try that.

OK, I tried to rewrite the code directly incrementing the log density, but it actually increased the issue of divergent transition and created an issue on initialisation, i.e. some chains are not initialised with this error message:

Chain 2: Initialization between (-2, 2) failed after 1 attempts.                                                                                          
[1] "Error in sampler$call_sampler(args_list[[i]]) : Initialization failed."

It’s weird that STAN gives up after only one attempt, and also because I initialise manually all parameters in the parameter and transformed parameter sections. So I thought this could be due to some weirdly defined constrained in the code, but I cannot find for the hell of me, something like this in my code.

Here’s the code:

functions { 
} 
data {
    int<lower=1> Nobs;                          // Total number of observations
    int<lower=0> W[Nobs];                       // Fitness variable
    vector[Nobs] z;                             // Phenotype variable
    
    int<lower=1> Nind;                          // Total number of years
    int<lower=1,upper=Nind> J_ind[Nobs];        // Grouping indices for ID
}
parameters {
    real<lower=0> omega;            // Peak Width
    real theta;                     // Values of the optimum parameter
    real<upper=0> log_Wmax;             // Values of the max fitness parameter
    
    real<lower=0> ind_sigma;        // Sigma of individuals effects
    vector[Nind] ind_extend;        // Individual effects
}
transformed parameters {
    vector<upper=0>[Nind] log_w_ind;         // Effect of individual on log_Wmax    
    // Wmax with individual effects
    log_w_ind = log_Wmax + ind_extend * ind_sigma;
} 
model {
    vector[Nobs] log_prob = -((z - theta) .* (z - theta)) ./ (2 * omega * omega);
    
    // Likelihood
    for (n in 1:Nobs) {
        if (W[n] == 1) {
            target += log_prob + log_w_ind[J_ind[n]];
        } else {
            target += log1m_exp(log_prob + log_w_ind[J_ind[n]]);
        }
    }
    
    // Priors
    omega ~ gamma(3.36,0.78);
    theta ~ normal(0,1000);
    log_Wmax  ~ normal(0,1);
    ind_extend ~ normal(0,1);
    ind_sigma ~ normal(0,1);
}
generated quantities { 
}

Any idea what is going on? Thanks!

Strike that, there was a very stupid typo in this code, which took embarassingly too long for me to catch: log_prob should be log_prob[n] everywhere within the loop…

Still, once corrected, I still get my many divergent transitions.

Hmm, you could try making plots of the output parameters and see if there’s any correlation between the divergences and parameter values.

(like the plots here: https://mc-stan.org/users/documentation/case-studies/divergences_and_bias.html – divergences in green)

Is log_w_ind actually guaranteed to be less than zero? log_Wmax is, but there are no constraints on ind_extend, so that could be positive. Perhaps the centered parameterization is worth trying?

The plots do not make particular parts of the parameter space to be causing the divergent transitions.

Is log_w_ind actually guaranteed to be less than zero? log_Wmax is, but there are no constraints on ind_extend, so that could be positive.

No, log_w_ind is not guaranteed to be less than zero, it is only constrained to be as such. That’s actually the source of my problem, creating random effects on a constrained parameter is not straightforward because the random effects are not individually constrained, only the resulting parameter is. Such a strategy is possible in e.g. JAGS, but creates these diverging transitions in STAN (granted, JAGS doesn’t have that kind of checks, so…).

Perhaps the centered parameterization is worth trying?

Maybe, though I’m not very optimistic this will solve anything. I’ll give it a try tomorrow though.

Yeah I doubt the centered thing will help now that I think more about it.

Back to the original question,

I don’t understand this fully.

You have binary data and it looks like you expect the untransformed probabilities to come from a function that takes the shape of a Gaussian along the single covariate.

Can you make a plot of the data?

Where does the sum to one constraint come from?

Plot_Gauss_binom.pdf (23.0 KB)

The attached figure should help explain what I’m trying to do. We have theoretical reasons to believe a Gaussian model for the latent probabilities (the blue curve) of the binomial response (the black dots) would be sensible (although it has many quirks, statistically speaking), and I wanted to test whether we can implemented such model.

The problem is that, then, the W_{max} parameter should be between 0 and 1, or its logarithm be negative. Running the model without any random effects works, but when I try to add random effects to W_{max}, I run into problems, as the random effects are not bounded individually (they should be able to increase, as well as decrease the intercept), but W_{max} is bounded between 0 and 1, and its log is bounded by 0.

Where does the sum to one constraint come from?

Sorry, I have not been very clear. It’s not that they “sum to 1”. First, although in my theoretical model, W_{max} is between 0 and 1, I’m using its log here, so I should have stated that the sum must be negative. The “sum” I’m referring to is the sum of the grand intercept and one of the random effects, this part of the code:

log_w_ind = log_Wmax + ind_extend * ind_sigma;

Here log_w_ind must be negative, although we must allow ind_extend * ind_sigma to be positive (as random effects should not be biased regarding the direction of the effect).

Does this make more sense?

Thank you very much for your help and patience, this is really appreciated. I must state again that I’m realising I might be asking too much of STAN here, my goal (for now) is rather to check whether this is possible at all.

1 Like

You can compute W_{max} on the logit scale and keep the exact Gaussian shape for the latent probabilities. Here’s how it looks for the model in your first post.

parameters {
    real<lower=0> omega;
    real theta;
    real logit_Wmax;
    real<lower=0> ind_sigma;
    vector[Nind] ind_extend;
}
transformed parameters {
    vector<upper=0>[Nind] w_ind;
    vector<lower=0,upper=1>[Nobs] prob;
    w_ind = inv_logit(logit_Wmax + ind_extend * ind_sigma);
    for (n in 1:Nobs) {
        prob[n] = w_ind[J_ind[n]]*exp(-square((z[n] - theta)/omega));
    }
} 
1 Like

This… is brilliant! It’s so elegant and simple I feel stupid not to have thought of it! I’ll implement this right away, see whether it solves my diverging transitions issues.

1 Like

Happy to report this totally solves my issue: no more divergent transitions and it’s running very smoothly!

Thank you all for your help!

1 Like