A matrix of parameters in Stan Integral

The closed form integral is doing (as far as I know) exactly the same thing as integrate_1d, so that shouldn’t influence the results.

Can you post the two models here where the only difference is the replacement of integrate_1d with the closed form equation?

2 Likes

Using integrate_1d:

functions{
  real cumulative_exposure(real x,       // Function argument
                           real xc,      // Complement of function argument
                                         // on the domain (defined later)
                           real[] theta, // parameters
                           real[] x_r,   // data (real)
                           int[] x_i) { // data (integer)
                      
    real Beta = theta[1];
    real eta = theta[2];
    real w2_1 = theta[3];
    real w2_2 = theta[4];
    
    return exp(Beta*(eta + w2_1 + w2_2*log(x)));
    }
}
data {
  int<lower=1> n; // Sample size
  int<lower=1> n_cens; // Right censored sample size
  int<lower=1> n_obs; // Failed unit sample size
  int<lower=1> N; // Total number of times considered (all times for all units)
  real<lower=0> tt[N]; //All times for all units
  real y_tt[N]; //All of the use-rates
  int w_ind[N]; //Indices for w
  real<lower=0> tt_fail[n_obs]; //Failure times for units that failed
  int w_ind_fail[n_obs]; //Indices for w for units that failed
  int w_ind_cens[n_cens];
  real<lower=0> tt_cens[n_cens];
}
transformed data {
  real x_r[0];
}
parameters {
  real <lower = 0> mu0;
  real <lower = 0> sigma0;
  real <lower = 0> sig;
  real <lower = 0> sig1;
  real <lower = 0> sig2;
  real <lower = -1, upper = 1>rho;
  real eta;
  real Beta;
  matrix[2, n] w;
}
model {
  //cumulative exposure for failed units and for censored units
  vector[n_obs] u_obs;
  vector[n_cens] u_cens;
  //Defining covariance matrix, cholesky decomposition and mean and variance of mixed effects model
  matrix[2,2] Sigma;
  real Mu[N];
  vector[n_obs] Mu_DFD;
  matrix[2,2] L;
  matrix[2, n] w2;
  //Covariance matrix
  Sigma[1,1] = sig1^2;
  Sigma[1,2] = rho*sig1*sig2;
  Sigma[2,1] = rho*sig1*sig2;
  Sigma[2,2] = sig2^2;
  //Cholesky decomposition
  L = cholesky_decompose(Sigma);
  w2 = L*w;
  //Parameters used in mixed effects model
  for(i in 1:N){
    Mu[i] = eta + w2[1,w_ind[i]] + w2[2,w_ind[i]]*log(tt[i]);
  }
  //Calculating mu_DFD 
  for(i in 1:n_obs){
    Mu_DFD[i] = eta + w2[1,w_ind_fail[i]] + w2[2,w_ind_fail[i]]*log(tt_fail[i]);
    u_obs[i] = integrate_1d(cumulative_exposure, 0.01, tt_fail[i],
    {Beta, eta, w2[1,w_ind_fail[i]], w2[2,w_ind_fail[i]]},
    x_r, {w_ind_fail[i]}, 1e-8);
  }
  for(i in 1:n_cens){
    u_cens[i] = integrate_1d(cumulative_exposure, 0.01, tt_cens[i],
    {Beta, eta, w2[1,w_ind_cens[i]], w2[2,w_ind_cens[i]]},
    x_r, {w_ind_cens[i]}, 1e-8);
  }
  //Likelihood
    target += weibull_lpdf(u_obs| 1/sigma0, exp(mu0));
    target += Beta*Mu_DFD;
    target += weibull_lccdf(u_cens|1/sigma0, exp(mu0));
    target += normal_lpdf(y_tt|Mu, sig);
    // Prior:
    Beta ~ normal(0, 10);
    mu0 ~ normal(0, 10);
    sigma0 ~ normal(0, 10);
    eta ~ normal(0, 10);
    to_vector(w) ~ normal(0, 1);
    sig ~ normal(0, 10);
    sig1 ~ normal(0, 10);
    sig2 ~ normal(0, 10);
    rho ~ normal(0, 10);
}

Using closed form:

data {
  int<lower=1> n; // Sample size
  int<lower=1> n_cens; // Right censored sample size
  int<lower=1> n_obs; // Failed unit sample size
  int<lower=1> N; // Total number of times considered (all times for all units)
  real<lower=0> tt[N]; //All times for all units
  real y_tt[N]; //All of the use-rates
  int w_ind[N]; //Indices for w
  real<lower=0> tt_fail[n_obs]; //Failure times for units that failed
  int w_ind_fail[n_obs]; //Indices for w for units that failed
  int w_ind_cens[n_cens];
  real<lower=0> tt_cens[n_cens];
}
parameters {
  real <lower = 0> mu0;
  real <lower = 0> sigma0;
  real <lower = 0> sig;
  real <lower = 0> sig1;
  real <lower = 0> sig2;
  real <lower = -1, upper = 1>rho;
  real eta;
  real Beta;
  matrix[2, n] w;
}
model {
  //cumulative exposure for failed units and for censored units
  vector[n_obs] u_obs;
  vector[n_cens] u_cens;
  //Defining covariance matrix, cholesky decomposition and mean and variance of mixed effects model
  matrix[2,2] Sigma;
  real Mu[N];
  vector[n_obs] Mu_DFD;
  matrix[2,2] L;
  matrix[2, n] w2;
  //Covariance matrix
  Sigma[1,1] = sig1^2;
  Sigma[1,2] = rho*sig1*sig2;
  Sigma[2,1] = rho*sig1*sig2;
  Sigma[2,2] = sig2^2;
  //Cholesky decomposition
  L = cholesky_decompose(Sigma);
  w2 = L*w;
  //Parameters used in mixed effects model
  for(i in 1:N){
    Mu[i] = eta + w2[1,w_ind[i]] + w2[2,w_ind[i]]*log(tt[i]);
  }
  //Calculating mu_DFD 
  for(i in 1:n_obs){
    Mu_DFD[i] = eta + w2[1,w_ind_fail[i]] + w2[2,w_ind_fail[i]]*log(tt_fail[i]);
    u_obs[i] = exp(Beta*(eta + w2[1,w_ind_fail[i]]))/(Beta*w2[2,w_ind_fail[i]] + 1)*(tt_fail[i]^(Beta*w2[2,w_ind_fail[i]] + 1) - 1);
  }
  for(i in 1:n_cens){
    u_cens[i] = exp(Beta*(eta + w2[1,w_ind_cens[i]]))/(Beta*w2[2,w_ind_cens[i]] + 1)*(tt_cens[i]^(Beta*w2[2,w_ind_cens[i]] + 1) - 1);
  }
  //Likelihood
    target += weibull_lpdf(u_obs| 1/sigma0, exp(mu0));
    target += Beta*Mu_DFD;
    target += weibull_lccdf(u_cens|1/sigma0, exp(mu0));
    target += normal_lpdf(y_tt|Mu, sig);
    // Prior:
    Beta ~ normal(0, 10);
    mu0 ~ normal(0, 10);
    sigma0 ~ normal(0, 10);
    eta ~ normal(0, 10);
    to_vector(w) ~ normal(0, 1);
    sig ~ normal(0, 10);
    sig1 ~ normal(0, 10);
    sig2 ~ normal(0, 10);
    rho ~ normal(0, 10);
}

Are you sure there should be the extra - 1 at the end of this:

exp(Beta*(eta + w2[1,w_ind_fail[i]]))/(Beta*w2[2,w_ind_fail[i]] + 1)*(tt_fail[i]^(Beta*w2[2,w_ind_fail[i]] + 1) - 1);

Also, if you’re only doing the integral from 0.01, that makes the closed form integral this:

e^{\beta [\eta + w_0]} \frac{1}{\beta w_1 + 1} t^{\beta w_1 + 1} \bigg\rvert ^{t = T}_{t = 0.01} \\ e^{\beta [\eta + w_0]} \frac{1}{\beta w_1 + 1} (T^{\beta w_1 + 1} - 0.01^{\beta w_1 + 1})

So there should be two terms there.

1 Like

It may be worth adding in the closed form equation to the model with the integrate_1d call (that runs) and printing out the closed form value vs. the integrate_1d output. They should be the same.

1 Like

Also to answer your previous post (A matrix of parameters in Stan Integral), Stan is evaluating the integral just like any other function.

Like if you had any function f(\beta, w_1), think of it just like that.

Don’t think of it doing any extra integral over \beta space or w_1 space.

1 Like

You’re right. It should be +1.

Nono, I think it should just be:

(tt_fail[i]^(Beta*w2[2,w_ind_fail[i]] + 1))

instead of:

(tt_fail[i]^(Beta*w2[2,w_ind_fail[i]] + 1) - 1)

Ah. I thought I had \beta w_1 -1 as the subscript of the integral for a second.

I will change the lower limit and think about the problem a little bit more.

I will also compare the values from the closed form equation to the model with the integrate1_d call.

I think I should also include the xc term for when x gets small. I will attempt this.

Edit: Changing the lower limit allows the code to run. I will be able to see if I get any divergent transitions, or anything else, when it finishes.

The code finished with less warning messages:

2: The largest R-hat is 1.24, indicating chains have not mixed.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#r-hat 
3: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#bulk-ess 
4: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#tail-ess 

I only ran the code for 500 iterations. These warnings do not seem as serious as the ones obtained earlier using integrate_1d. Perhaps running the chains for more iterations is the solution.

Maybe eventually, but the thing to do now is use fake data generated from this model or if you’re already using fake data use a smaller amount of fake data and figure out why your rhat isn’t going to 1.0. That indicates either multiple chains aren’t sampling the same distribution, or within each chain there are problems mixing.

Okay. Note, I ran the initial code in this post (very first post) and it worked, there were no warning messages and I recovered the parameters that I expected. However, the n_eff values were very low (I recall one of them being 69) from 1000 iterations.

Diagnostics plots showed that the samples (for some parameters) had very high autocorrelation for up to lags of 50. I had to run it for much longer and use thin.

I wonder if this is the case here.

1 Like

No reason to throw away samples here. Historically that was something people used when they were running for so long they were running out of memory.

Oh okay then do just run more then!

1 Like

I ran the code for 5000 iterations (with one chain) and I received more warnings

2: There were 1506 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup 
3: There were 994 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 10. See
http://mc-stan.org/misc/warnings.html#maximum-treedepth-exceeded 
4: There were 1 chains where the estimated Bayesian Fraction of Missing Information was low. See
http://mc-stan.org/misc/warnings.html#bfmi-low 
5: Examine the pairs() plot to diagnose sampling problems
 
6: The largest R-hat is 2.12, indicating chains have not mixed.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#r-hat 
7: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#bulk-ess 
8: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#tail-ess 

The code took just over 8 hours to run. The amount of divergent transitions is a worryingly high. It is interesting that I did not receive any divergent transitions when running the code for 500 iterations, though.

I would have expected these warning messages to have shown when I ran the code for 500 iterations.

That’s pretty dramatic for sure, but not much to do other than plot where the divergences are. Try to figure out what parts of the model are blowing up.

Also with a model that’s struggling like this, keep running multiple chains.

Your model has two pieces, right? The first half works by itself, right? Does the second half?

The model does have two parts.

The model completely works when I enter the u_obs and u_cens as data.
It is now breaking down when I try to calculate u_obs and u_cens in the Stan block.

In reality u_obs and u_cens are unobservable and should not be entered as data. I only did so to test a simpler model.

Just to clarify. The model does have two parts. Both parts work when I enter u_obs and u_cens as data, but realistically I want to calculate them inside the Stan code. I can thus narrow down the problem in the code to the lines

for(i in 1:n_obs){
    Mu_DFD[i] = eta + w2[1,w_ind_fail[i]] + w2[2,w_ind_fail[i]]*log(tt_fail[i]);
    u_obs[i] = exp(Beta*(eta + w2[1,w_ind_fail[i]]))/(Beta*w2[2,w_ind_fail[i]] + 1)*(tt_fail[i]^(Beta*w2[2,w_ind_fail[i]] + 1) - 0.01^(Beta*w2[2,w_ind_fail[i]] + 1));
  }
  for(i in 1:n_cens){
    u_cens[i] = exp(Beta*(eta + w2[1,w_ind_cens[i]]))/(Beta*w2[2,w_ind_cens[i]] + 1)*(tt_cens[i]^(Beta*w2[2,w_ind_cens[i]] + 1) - 0.01^(Beta*w2[2,w_ind_cens[i]] + 1));
  }

Edit: In another post I had an issue with the second part of the model when I was entering u_obs and u_cens as data. But, that model completely works now.

Does it work if you only have one of u_obs and u_cens to worry about?

Update: I simulated a small dataset and was able to recover the true parameter values.

I used very strict priors, normal(true value, 1). I am now going to try and change the priors and see how uninformative I can make them before I cannot recover the true parameter values; and perhaps see if any parameters are more troublesome than others.

2 Likes

Cool. Make sure and run long chains since it seems like sometimes that’s what it takes for the model to show problems.

I am able to recover the true parameter values with the small dataset and vague priors, normal(0,10) for each parameter.

The true value of sig1 is in the 95\% credible interval of the returned sample for sig1, but only just; please see attached. I do not think this is a cause for concern, though. The true value of sig1 is 0.46.

sig1.pdf (619.5 KB)

I will now attempt on a larger dataset.

You can try different values and see what happens. sig1 was just something you set as a constant when you generated data? Not something you randomly sampled?