Random effects on a constrained parameter and divergent transitions

devillemereuil · September 4, 2019, 8:45am

For some reasons, I am trying to fit a model with a Gaussian-shape optimum on binary data. This can be done quite easily and nicely when no random effects are involved, but I am now trying to include random effects on the parameter W_{max} that constrains the latent model between 0 and 1. This, of course, is difficult as the random effects are not individually constrained, only their sum must be constrained between 0 and 1. Worse, the aim would to include more than one effect.

But let’s focus on a single random effect for now. I figured it would be more easy on a log scale as this allow for only one boundary (namely 0) rather than two (0 and 1). The following model runs, but results in a lot of divergent transitions (roughly 25% with adapt_delta at 0.80, needless to say that increasing this value doesn’t result in total remove of such transitions):

functions { 
} 
data {
    int<lower=1> Nobs;                          // Total number of observations
    int<lower=0> W[Nobs];                    // Response variable
    vector[Nobs] z;                                 // Covariate
    
    int<lower=1> Nind;                          // Total number of individuals
    int<lower=1,upper=Nind> J_ind[Nobs];        // Grouping indices for ID
}
parameters {
    real<lower=0> omega;            // Peak Width
    real theta;                                 // Values of the optimum parameter
    real<upper=0> log_Wmax;             // Values of the max fitness parameter
    
    real<lower=0> ind_sigma;        // Sigma of individuals effects
    vector[Nind] ind_extend;        // Individual effects
}
transformed parameters {
    vector<upper=0>[Nind] log_w_ind;         // Effect of individual on log_Wmax
    vector<lower=0,upper=1>[Nobs] prob;
    
    // Wmax with individual effects
    log_w_ind = log_Wmax + ind_extend * ind_sigma;
    
    for (n in 1:Nobs) {
        prob[n] = exp(log_w_ind[J_ind[n]] -((z[n] - theta) .* (z[n] - theta)) ./ (2 * omega * omega));
    }
    
} 
model {
    // Likelihood
    W ~ bernoulli(prob);
    
    // Priors
    omega ~ gamma(3.36,0.78);
    theta ~ normal(0,1000);
    log_Wmax  ~ normal(0,1);
    ind_extend ~ normal(0,1);
    ind_sigma ~ normal(0,1);
}
generated quantities { 
}

Any guidance would be appreciated here. Am I being unreasonable in wanting to fit this model? Is there any trick I could use to make the boundary issue less painful?

FYI, a (unconstrained) logit-link model on the same data runs nicely.

Thanks!

bbbales2 · September 4, 2019, 2:16pm

First thing to try is the Bernoulli logit parameterization: https://mc-stan.org/docs/2_20/functions-reference/bernoulli-logit-distribution.html

Doing the exps in the code leads to overflows a lot of the time. Also is this guaranteed to be less than one?:

prob[n] = exp(log_w_ind[J_ind[n]] -((z[n] - theta) .* (z[n] - theta)) ./ (2 * omega * omega));

If not that could cause problems.

devillemereuil · September 4, 2019, 2:25pm

The thing is that I’m trying to avoid the logit for this particular model. For some reasons, I’m trying to fit the Gaussian curve directly (which is not possible with a logit, only approximatively).

prob will indeed be less that one, as log_w_ind is less than zero and we remove something positive (a squared value) from it, so the content of the exponential should be less than 0, given the constraints provided.

bbbales2 · September 4, 2019, 2:28pm

Oh okay. Probly worth trying to increment the log density manually to avoid the explicit exp then. Not sure if that’s the problem here but this sorta thing can definitely lead to divergences.

devillemereuil · September 4, 2019, 2:29pm

Thanks, I’ll try that.

devillemereuil · September 6, 2019, 9:48am

OK, I tried to rewrite the code directly incrementing the log density, but it actually increased the issue of divergent transition and created an issue on initialisation, i.e. some chains are not initialised with this error message:

Chain 2: Initialization between (-2, 2) failed after 1 attempts.                                                                                          
[1] "Error in sampler$call_sampler(args_list[[i]]) : Initialization failed."

It’s weird that STAN gives up after only one attempt, and also because I initialise manually all parameters in the parameter and transformed parameter sections. So I thought this could be due to some weirdly defined constrained in the code, but I cannot find for the hell of me, something like this in my code.

Here’s the code:

functions { 
} 
data {
    int<lower=1> Nobs;                          // Total number of observations
    int<lower=0> W[Nobs];                       // Fitness variable
    vector[Nobs] z;                             // Phenotype variable
    
    int<lower=1> Nind;                          // Total number of years
    int<lower=1,upper=Nind> J_ind[Nobs];        // Grouping indices for ID
}
parameters {
    real<lower=0> omega;            // Peak Width
    real theta;                     // Values of the optimum parameter
    real<upper=0> log_Wmax;             // Values of the max fitness parameter
    
    real<lower=0> ind_sigma;        // Sigma of individuals effects
    vector[Nind] ind_extend;        // Individual effects
}
transformed parameters {
    vector<upper=0>[Nind] log_w_ind;         // Effect of individual on log_Wmax    
    // Wmax with individual effects
    log_w_ind = log_Wmax + ind_extend * ind_sigma;
} 
model {
    vector[Nobs] log_prob = -((z - theta) .* (z - theta)) ./ (2 * omega * omega);
    
    // Likelihood
    for (n in 1:Nobs) {
        if (W[n] == 1) {
            target += log_prob + log_w_ind[J_ind[n]];
        } else {
            target += log1m_exp(log_prob + log_w_ind[J_ind[n]]);
        }
    }
    
    // Priors
    omega ~ gamma(3.36,0.78);
    theta ~ normal(0,1000);
    log_Wmax  ~ normal(0,1);
    ind_extend ~ normal(0,1);
    ind_sigma ~ normal(0,1);
}
generated quantities { 
}

Any idea what is going on? Thanks!

devillemereuil · September 6, 2019, 10:11am

Strike that, there was a very stupid typo in this code, which took embarassingly too long for me to catch: log_prob should be log_prob[n] everywhere within the loop…

Still, once corrected, I still get my many divergent transitions.

bbbales2 · September 7, 2019, 8:06pm

Hmm, you could try making plots of the output parameters and see if there’s any correlation between the divergences and parameter values.

(like the plots here: https://mc-stan.org/users/documentation/case-studies/divergences_and_bias.html – divergences in green)

Is log_w_ind actually guaranteed to be less than zero? log_Wmax is, but there are no constraints on ind_extend, so that could be positive. Perhaps the centered parameterization is worth trying?

devillemereuil · September 8, 2019, 12:01pm

The plots do not make particular parts of the parameter space to be causing the divergent transitions.

Is log_w_ind actually guaranteed to be less than zero? log_Wmax is, but there are no constraints on ind_extend, so that could be positive.

No, log_w_ind is not guaranteed to be less than zero, it is only constrained to be as such. That’s actually the source of my problem, creating random effects on a constrained parameter is not straightforward because the random effects are not individually constrained, only the resulting parameter is. Such a strategy is possible in e.g. JAGS, but creates these diverging transitions in STAN (granted, JAGS doesn’t have that kind of checks, so…).

Perhaps the centered parameterization is worth trying?

Maybe, though I’m not very optimistic this will solve anything. I’ll give it a try tomorrow though.

bbbales2 · September 9, 2019, 8:23pm

Yeah I doubt the centered thing will help now that I think more about it.

Back to the original question,

I don’t understand this fully.

You have binary data and it looks like you expect the untransformed probabilities to come from a function that takes the shape of a Gaussian along the single covariate.

Can you make a plot of the data?

Where does the sum to one constraint come from?

devillemereuil · September 10, 2019, 7:40am

Plot_Gauss_binom.pdf (23.0 KB)

The attached figure should help explain what I’m trying to do. We have theoretical reasons to believe a Gaussian model for the latent probabilities (the blue curve) of the binomial response (the black dots) would be sensible (although it has many quirks, statistically speaking), and I wanted to test whether we can implemented such model.

The problem is that, then, the W_{max} parameter should be between 0 and 1, or its logarithm be negative. Running the model without any random effects works, but when I try to add random effects to W_{max}, I run into problems, as the random effects are not bounded individually (they should be able to increase, as well as decrease the intercept), but W_{max} is bounded between 0 and 1, and its log is bounded by 0.

Where does the sum to one constraint come from?

Sorry, I have not been very clear. It’s not that they “sum to 1”. First, although in my theoretical model, W_{max} is between 0 and 1, I’m using its log here, so I should have stated that the sum must be negative. The “sum” I’m referring to is the sum of the grand intercept and one of the random effects, this part of the code:

log_w_ind = log_Wmax + ind_extend * ind_sigma;

Here log_w_ind must be negative, although we must allow ind_extend * ind_sigma to be positive (as random effects should not be biased regarding the direction of the effect).

Does this make more sense?

Thank you very much for your help and patience, this is really appreciated. I must state again that I’m realising I might be asking too much of STAN here, my goal (for now) is rather to check whether this is possible at all.

nhuurre · September 10, 2019, 8:53am

You can compute W_{max} on the logit scale and keep the exact Gaussian shape for the latent probabilities. Here’s how it looks for the model in your first post.

parameters {
    real<lower=0> omega;
    real theta;
    real logit_Wmax;
    real<lower=0> ind_sigma;
    vector[Nind] ind_extend;
}
transformed parameters {
    vector<upper=0>[Nind] w_ind;
    vector<lower=0,upper=1>[Nobs] prob;
    w_ind = inv_logit(logit_Wmax + ind_extend * ind_sigma);
    for (n in 1:Nobs) {
        prob[n] = w_ind[J_ind[n]]*exp(-square((z[n] - theta)/omega));
    }
}

devillemereuil · September 10, 2019, 9:01am

This… is brilliant! It’s so elegant and simple I feel stupid not to have thought of it! I’ll implement this right away, see whether it solves my diverging transitions issues.

devillemereuil · September 11, 2019, 1:30pm

Happy to report this totally solves my issue: no more divergent transitions and it’s running very smoothly!

Thank you all for your help!

Topic		Replies	Views
Constrained multinormal Modeling	2	786	August 30, 2019
Random-effect specifications for non-linear model brms	2	698	July 11, 2021
When adding random effects to a model, Log probability evaluates to log(0), i.e. negative infinity Modeling cognitive-science	10	2717	March 18, 2019
Model specification with random intercept for group vs. predicting group-level mean Modeling fitting-issues , specification , brms	1	31	September 9, 2024
Help with my divergent transitions, part 999 Modeling	9	1084	May 6, 2021

Random effects on a constrained parameter and divergent transitions

Related topics