ODE non-converging; Rhat beyond 1

irenegfsr · September 19, 2022, 6:44am

Hi,

The issue maybe is that, my times are not really 3 observations per subject, as defined in your data. My data is unbalanced. I have also changed the inits function, so that it would not turn out to inf or nan, but I still get the same error message. How does your response from the generated data look like?

irenegfsr · September 19, 2022, 6:59am

When I run that code, with the following sample function;
slanat ← cmdstan_model(“C:/Users/IGarciaFogeda/Documents/WP4/DRC/code/Mechanistic models/andraudexample.stan”)
fit_slanat ← slanat$sample(
seed = 123,
chains = 1,
iter_warmup = 1, iter_sampling = 1, step_size = 0.05,
refresh = 500 # print update every 500 iters
)

I get the following error message:

Am I missing any element from the sample function?

nhuurre · September 19, 2022, 8:07am

That’s not a fatal error, it’s a warning. If the warnings stop after the first few hundred iterations then it’s not a problem.
You could set init= and an even smaller step_size= and see if that prevents the warning. And don’t put a lower bound on logAB0.

I mentioned earlier that I added this right before the lognormal

if (is_nan(new_mu[i,3])) {
  print("t = ", ts[i], "; inits = ", inits, "; theta = ", theta);
  print("y = ", y[i], "; mu = ", mu[i,3], "; new_mu = ", new_mu[i,3]);
}

(That’s not going to fix anything but it’ll show you what the model was doing right before failing.)

irenegfsr · September 21, 2022, 8:05am

Hi,

I have been running different models since I realised that my data was not sorted per subject.
I have tried the following things:
Sorting times per subject (0, 22, 70, 0, 50, 0, 30…) or sorting the times from the minimum value (several 0) to the maximum. With both of the options I get the following warning, which triggers the model to not converge, thus the Rhat goes beyond 1.

Lognormal_lpdf: location parameter is inf, but must be finite! line 109

The same occurs if I run the analytical model that we have mentioned previously in this post.

functions {
  real[] sldecay(real t, real[] y, real[] theta,  real[] x_r, int[] x_i) {
    real dydt[3];
    dydt[1] = -theta[1]*y[1];
    dydt[2] = -theta[2]*y[2];
    dydt[3] = theta[1]*y[1] + theta[2]*y[2] - theta[3]*y[3];
    
    return dydt;
  }
}
data {
  int<lower=1> N;
  int<lower=1> nsubjects;
  int<lower=1> subj_ids[N];
  real <lower=0> y[N];
  int <lower=1> ntimes;
  real ts[ntimes];
  real t0;
}
transformed data {
    real x_r[0];
    int x_i[0]; 
}
parameters {
  real  logps;
  real  logpl;
  real  logmuab;
  real  sigmalogmuab;
  
  real <lower= 0> logAB0;
  real  sigmalogAB0;
  real  logps0;
  real  logpl0;
  real   sigma;
  
  vector[2] r[nsubjects];
  //real rho_rhobit;
}
model{
  real mu[1, 3];
  real inits[3];
  real theta[3];
  real new_mu[ntimes, 3];
  vector[nsubjects] rAB0;
  vector[nsubjects] rmuab;
  real eval_time[1];
  // transformed parameters 
  real ps = exp(logps);
  real pl = exp(logpl);
  real muab = exp(logmuab);
  real sigma_muab = exp(sigmalogmuab);
  real AB0 = exp(logAB0);
  real sigma_AB0 = exp(sigmalogAB0);
  real ps0 = exp(logps0);
  real pl0 = exp(logpl0);
  real sigmap = exp(sigma);
  // defining the correlation 
  vector[2] mean_vec;
  matrix[2,2] Sigma;
  //real rho = (exp(rho_rhobit) - 1)/(exp(rho_rhobit) + 1);
 
  //prior distributions
  logps ~ normal(0, 1);
  logpl ~ normal(0, 1);
  logmuab ~ normal(0, 10);
  sigmalogmuab ~ normal(0, 1);
  logAB0 ~ normal(3, .5);
  sigmalogAB0 ~ normal(0, 1);
  logps0 ~ normal(0, 1);
  logpl0 ~ normal(0, 1);
  sigma ~ normal(0, 1);
  //rho_rhobit ~ uniform(-10, 10);
  
  //cov matrix
  mean_vec[1] = -pow(sigma_AB0, 2.0)/2;
  mean_vec[2] = -pow(sigma_muab, 2.0)/2;
  
  Sigma[1,1] = pow(sigma_AB0, 2.0);
  Sigma[2,2] = pow(sigma_muab, 2.0);
  Sigma[1,2] = 0;
  Sigma[2,1] = 0;
  
  
  for (j in 1:nsubjects) {
    r[j] ~ multi_normal_cholesky(mean_vec, Sigma);
    rAB0[j] = r[j,1];
    rmuab[j] = r[j,2];
  }
  for (i in 1:N){
    //defining the initial conditions
    inits[3] = AB0*exp(rAB0[subj_ids[i]]);
    inits[1] = inits[3]/ps0;
    inits[2] = inits[3]/pl0;
    //defining the parameters
    theta[1] = ps;
    theta[2] = pl;
    theta[3] = muab*exp(rmuab[subj_ids[i]]);
    // ode solver
    eval_time[1] = ts[i];
    if (eval_time[1] == 0){
      new_mu[i,1] = inits[1];
      new_mu[i,2] = inits[2];
      new_mu[i,3] = log(inits[3]) - pow(sigmap, 2.0)/2;}
    else{
      mu = integrate_ode_rk45(sldecay, inits, t0, eval_time, theta, x_r, x_i);
      new_mu[i,1] = mu[1,1];
      new_mu[i,2] = mu[1,2];
      new_mu[i,3] = log(mu[1,3]) - pow(sigma, 2.0)/2;}
      y[i] ~ lognormal(new_mu[i, 3], sigmap);
    }
}

nhuurre · September 21, 2022, 1:26pm

irenegfsr:

  Sigma[1,1] = pow(sigma_AB0, 2.0);
  Sigma[2,2] = pow(sigma_muab, 2.0);
  Sigma[1,2] = 0;
  Sigma[2,1] = 0;
  
  
  for (j in 1:nsubjects) {
    r[j] ~ multi_normal_cholesky(mean_vec, Sigma);

Cholesky factor is like a square root. Don’t square the sigmas if you use multi_normal_cholesky

  Sigma[1,1] = sigma_AB0;
  Sigma[2,2] = sigma_muab;

Was that supposed to be sigmap like in the other branch?

Constraining the priors for sigma parameters reduces divergencies substantially

sigmalogmuab ~ normal(0, 0.1);
sigmalogAB0 ~ normal(0, 0.1);
sigma ~ normal(0, 0.1);

Instead of log-transforming these you could declare them directly as constrained types

parameters {
  real<lower=0> sigma_muab;
  real<lower=0> sigma_AB0;
}
model {
  sigma_muab ~ exponential(1.0);
  sigma_AB0 ~ exponential(1.0);
}

I don’t think logAB0 needs a <lower=0>. The implicit constraining transform is may cause problems there.

irenegfsr · September 21, 2022, 3:23pm

Hi,
You mean these parameters to be transformed with the exp(1)? But keep the specification of normal(0, 0.01) as a prior?
Sigmap before, it means the exp(logsigma), to restrict this to positive.

nhuurre · September 21, 2022, 3:37pm

I mean, you don’t need to do

parameters {
  real logsigma;
} transformed parameters {
  real sigma = exp(logsigma);

just to restrict to positive values because

parameters {
  real<lower=0> sigma;
}

works just as well.

I think you need to consider more carefully what kind of priors you want these parameters to have. exponential(1.0) was just one example that improves r-hats but isn’t super narrow.

irenegfsr · October 4, 2022, 7:50am

Hi,

I was dealing with issues of convergence as well by trying out this.
I have posted the codes in another post, what could be causing then the non-convergence?

Topic		Replies	Views
Rhat greater than 1 for some parameters Modeling	7	1693	March 31, 2023
Fairly simple hierarchical model with odd Rhat and neff values Modeling fitting-issues	17	1269	June 7, 2019
Rhat of one estimate always close to 1 Modeling fitting-issues	2	189	February 29, 2024
Huge RHATs, Extreme Model Performance Variability Depending on Priors Modeling	10	879	September 23, 2020
Cherry-picking Rhat values Modeling	16	1276	October 6, 2017

ODE non-converging; Rhat beyond 1

Related topics