A matrix of parameters in Stan Integral

Then I guess we gotta think in terms of that integral. What’s it predicting? Why is it e^something? Why is it log(t) and not t? What are the units on w_0 and w_1, etc.

I tried changing the lower integral and that allows the code to run fine. If this integral is a good approximation of the actual integral (with lower limit equal to zero) then this could be a solution.

If I am not mistaken, the approximation error will be proportional to b^{\beta w_1 + 1}, where b is the new lower limit.

The model is

X_i(t_{ij}) = \eta + Z_i(t_{ij})w_i + \epsilon_{ij},
where \eta is the mean, Z_i(t_{ij}) = [1, \log(t_{ij})], w_i = (w_{0i}, w_{1i})' \sim N(0,\Sigma_{w}), \epsilon_{ij} \sim N(0, \sigma^2), and

\Sigma_{w} = \begin{pmatrix} \sigma^2_1& \rho\sigma_1\sigma_2 \\ \rho\sigma_1\sigma_2 & \sigma^2_2 \end{pmatrix}.

so the w’s are just random effects. The relationship between X_i and t_{ij} is exponential, and so \log(t) is used.

The other part of the model is

u = \int^{T_i}_0 \exp\{\beta \times (\eta + Z_i(t_{ij})w_i)\}. The larger the value of u, the more likely the units are to fail. \beta is an acceleration coefficient which controls how quickly grows.

No it won’t be. The issue is the integral goes to infinity if you include that little bit.

This is some sorta hazard model? I don’t think it’s avoidable. I think in this case we’ve got to figure out what \beta w_1 < -1 means. If the u blows up, does that mean failure is guaranteed? Is that what \beta w_1 < -1 implies?

1 Like

Yes, it is.

Yes. u is assumed to follow a Weibull distribution. Once u is large enough the probability of failure will be 1.

I am not sure sure. I will think about this. But, from a first thought, I would have thought not.

In the problem I am considering, if w_1 = -2 or something, and \beta = 1.5 (the true value of \beta = 1.5), this would be a singularity. But, a negative w_1 means that the unit is not being used as often over time, so this unit should be less likely to fail than a unit with a higher value of w_1.

How does Stan perform integration?

For the integral:

\int^T_0 t^{\beta w_1}dt

If I choose n iterations, will it evaluate the integral for each pair (\beta, w_1) and then return the mean of the n integrals?

Also, how does Stan actually evaluate the integrand?

The reason I ask is because I am curious to whether Stan ever evaluates the integrand at t=0.

For the integral

\int^T_0 t^{\beta w_1}dt

I enter the lower and upper limits, but I have n samples of \beta and w_1. Does Stan sample t from \text{U}(0,T), and hence have n vectors (\beta, w_1, t), and then perform Monte Carlo integration?

So Stan (although it is saying 1D integration) is actually evaluating

\int_{W}\int_B\int^T_0 t^{\beta w_1}dtd\beta dw_1,

where B and W are the parameter spaces for \beta and W, respectively.

If so, then the probability that t=0 is drawn should be zero and the singularities should not be an issue. But, I am not sure what Stan is doing behind the scenes.

Update: Stan appears to be able to take transformed parameters (from the transformed parameter block and from the model block) as inputs to integrate_1d. This allowed me to use a non-centered parameterization.

Warning messages:
2: There were 184 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup 
3: There were 66 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 10. See
http://mc-stan.org/misc/warnings.html#maximum-treedepth-exceeded 
4: Examine the pairs() plot to diagnose sampling problems
 
5: The largest R-hat is 11.14, indicating chains have not mixed.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#r-hat 
6: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#bulk-ess 
7: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#tail-ess 

I seem to have received every warning message that you can expect from Stan.

functions{
  real cumulative_exposure(real x,       // Function argument
                           real xc,      // Complement of function argument
                                         // on the domain (defined later)
                           real[] theta, // parameters
                           real[] x_r,   // data (real)
                           int[] x_i) { // data (integer)
                      
    real Beta = theta[1];
    real eta = theta[2];
    real w2_1 = theta[3];
    real w2_2 = theta[4];
    
    return exp(Beta*(eta + w2_1 + w2_2*log(x)));
    }
}
data {
  int<lower=1> n; // Sample size
  int<lower=1> n_cens; // Right censored sample size
  int<lower=1> n_obs; // Failed unit sample size
  int<lower=1> N; // Total number of times considered (all times for all units)
  real<lower=0> tt[N]; //All times for all units
  real y_tt[N]; //All of the use-rates
  int w_ind[N]; //Indices for w
  real<lower=0> tt_fail[n_obs]; //Failure times for units that failed
  int w_ind_fail[n_obs]; //Indices for w for units that failed
  int w_ind_cens[n_cens];
  real<lower=0> tt_cens[n_cens];
}
transformed data {
  real x_r[0];
}
parameters {
  real <lower = 0> mu0;
  real <lower = 0> sigma0;
  real <lower = 0> sig;
  real <lower = 0> sig1;
  real <lower = 0> sig2;
  real <lower = -1, upper = 1>rho;
  real eta;
  real Beta;
  matrix[2, n] w;
}
model {
  //cumulative exposure for failed units and for censored units
  vector[n_obs] u_obs;
  vector[n_cens] u_cens;
  //Defining covariance matrix, cholesky decomposition and mean and variance of mixed effects model
  matrix[2,2] Sigma;
  real Mu[N];
  vector[n_obs] Mu_DFD;
  matrix[2,2] L;
  matrix[2, n] w2;
  //Covariance matrix
  Sigma[1,1] = sig1^2;
  Sigma[1,2] = rho*sig1*sig2;
  Sigma[2,1] = rho*sig1*sig2;
  Sigma[2,2] = sig2^2;
  //Cholesky decomposition
  L = cholesky_decompose(Sigma);
  w2 = L*w;
  //Parameters used in mixed effects model
  for(i in 1:N){
    Mu[i] = eta + w2[1,w_ind[i]] + w2[2,w_ind[i]]*log(tt[i]);
  }
  //Calculating mu_DFD 
  for(i in 1:n_obs){
    Mu_DFD[i] = eta + w2[1,w_ind_fail[i]] + w2[2,w_ind_fail[i]]*log(tt_fail[i]);
    u_obs[i] = integrate_1d(cumulative_exposure, 0.01, tt_fail[i],
    {Beta, eta, w2[1,w_ind_fail[i]], w2[2,w_ind_fail[i]]},
    x_r, {w_ind_fail[i]}, 1e-8);
  }
  for(i in 1:n_cens){
    u_cens[i] = integrate_1d(cumulative_exposure, 0.01, tt_cens[i],
    {Beta, eta, w2[1,w_ind_cens[i]], w2[2,w_ind_cens[i]]},
    x_r, {w_ind_cens[i]}, 1e-8);
  }
  //Likelihood
    target += weibull_lpdf(u_obs| 1/sigma0, exp(mu0));
    target += Beta*Mu_DFD;
    target += weibull_lccdf(u_cens|1/sigma0, exp(mu0));
    target += normal_lpdf(y_tt|Mu, sig);
    // Prior:
    Beta ~ normal(0, 10);
    mu0 ~ normal(0, 10);
    sigma0 ~ normal(0, 10);
    eta ~ normal(0, 10);
    to_vector(w) ~ normal(0, 1);
    sig ~ normal(0, 10);
    sig1 ~ normal(0, 10);
    sig2 ~ normal(0, 10);
    rho ~ normal(0, 10);
}

The code took over 9 hours to run with 500 iterations. I have a feeling increasing adapt_delta and max_treedepth, will still produce the same errors (but with a smaller number of divergent transitions and a smaller number of transitions exceeding the maximum treedepth).

By changing the lower limit from 0 to 0.01, it allows the Stan code to run, and the error in doing this is:

\frac{e^{\beta[\eta + w_0]}}{\beta w_1 + 1}0.01^{\beta w_1 + 1},

which is very small for the true parameter values.

There is something in the Stan user guide about using xc to avoid precision loss in definite integrals. I could perhaps use this when t gets close to the lower limit (0.01).

I could also approximate the integral in Stan by using the trapezium rule (or some other technique). I tried this and had to stop my code because it was taking too long.

Those are my ideas. But, I am curious to know what could be causing all of the warning messages. It seems that Stan is not affected by the singularity (there is only one singularity now since I changed the lower limit) anymore; because Stan would have stopped itself if it was. For this reason I do not think reformulating the statistical model (so that the integral has no singularities) would help. I cannot reparameterize the Stan code anymore, either.

Haha, it seems you pretty much did.

Why use the integrate_1d functionality if we know the integral in closed form?

I am still skeptical that removing the singularity in this way is a good idea. What does it mean to cut off [0, 0.01]? I think the argument here is the integral on [0.01, limit] is going to be a good approximation to [0.0, limit] which is not true because the integral [0.0, 0.01] is infinite. [0.0, 0.01] is where most of the stuff is, so cutting it off is a bad approximation.

But you’re right that we really don’t want this integral to go to infinity cause that implies there is no uncertainty in our outcome and (I believe) we don’t really believe that! So somehow there is a problem with the model and we’re using this integral limit as a way to work around it? We’re somehow putting a limit on the hazard function? I’d rather constrain the parameters but if that isn’t possible then I guess we gotta do this.

1 Like

I have just attempted to do this. I find the following error:

Chain 1: Rejecting initial value:
Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: 
Chain 1: Initialization between (-2, 2) failed after 100 attempts. 
Chain 1:  Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.

I have tried setting the initial values to the true values, but the same errors are shown.

The issue is that some units that fail very quickly have very high values of u. This means that \text{Pr}(\text{Unit Fails}) \rightarrow 1, but some of these u values have close to zero density, and hence their log-density tends to minus infinity.

I can understand why Stan does not like this, but, a large u value is possible, meaning that the probability of failure is certain.

Is there some technique to get around this?

The closed form integral is doing (as far as I know) exactly the same thing as integrate_1d, so that shouldn’t influence the results.

Can you post the two models here where the only difference is the replacement of integrate_1d with the closed form equation?

2 Likes

Using integrate_1d:

functions{
  real cumulative_exposure(real x,       // Function argument
                           real xc,      // Complement of function argument
                                         // on the domain (defined later)
                           real[] theta, // parameters
                           real[] x_r,   // data (real)
                           int[] x_i) { // data (integer)
                      
    real Beta = theta[1];
    real eta = theta[2];
    real w2_1 = theta[3];
    real w2_2 = theta[4];
    
    return exp(Beta*(eta + w2_1 + w2_2*log(x)));
    }
}
data {
  int<lower=1> n; // Sample size
  int<lower=1> n_cens; // Right censored sample size
  int<lower=1> n_obs; // Failed unit sample size
  int<lower=1> N; // Total number of times considered (all times for all units)
  real<lower=0> tt[N]; //All times for all units
  real y_tt[N]; //All of the use-rates
  int w_ind[N]; //Indices for w
  real<lower=0> tt_fail[n_obs]; //Failure times for units that failed
  int w_ind_fail[n_obs]; //Indices for w for units that failed
  int w_ind_cens[n_cens];
  real<lower=0> tt_cens[n_cens];
}
transformed data {
  real x_r[0];
}
parameters {
  real <lower = 0> mu0;
  real <lower = 0> sigma0;
  real <lower = 0> sig;
  real <lower = 0> sig1;
  real <lower = 0> sig2;
  real <lower = -1, upper = 1>rho;
  real eta;
  real Beta;
  matrix[2, n] w;
}
model {
  //cumulative exposure for failed units and for censored units
  vector[n_obs] u_obs;
  vector[n_cens] u_cens;
  //Defining covariance matrix, cholesky decomposition and mean and variance of mixed effects model
  matrix[2,2] Sigma;
  real Mu[N];
  vector[n_obs] Mu_DFD;
  matrix[2,2] L;
  matrix[2, n] w2;
  //Covariance matrix
  Sigma[1,1] = sig1^2;
  Sigma[1,2] = rho*sig1*sig2;
  Sigma[2,1] = rho*sig1*sig2;
  Sigma[2,2] = sig2^2;
  //Cholesky decomposition
  L = cholesky_decompose(Sigma);
  w2 = L*w;
  //Parameters used in mixed effects model
  for(i in 1:N){
    Mu[i] = eta + w2[1,w_ind[i]] + w2[2,w_ind[i]]*log(tt[i]);
  }
  //Calculating mu_DFD 
  for(i in 1:n_obs){
    Mu_DFD[i] = eta + w2[1,w_ind_fail[i]] + w2[2,w_ind_fail[i]]*log(tt_fail[i]);
    u_obs[i] = integrate_1d(cumulative_exposure, 0.01, tt_fail[i],
    {Beta, eta, w2[1,w_ind_fail[i]], w2[2,w_ind_fail[i]]},
    x_r, {w_ind_fail[i]}, 1e-8);
  }
  for(i in 1:n_cens){
    u_cens[i] = integrate_1d(cumulative_exposure, 0.01, tt_cens[i],
    {Beta, eta, w2[1,w_ind_cens[i]], w2[2,w_ind_cens[i]]},
    x_r, {w_ind_cens[i]}, 1e-8);
  }
  //Likelihood
    target += weibull_lpdf(u_obs| 1/sigma0, exp(mu0));
    target += Beta*Mu_DFD;
    target += weibull_lccdf(u_cens|1/sigma0, exp(mu0));
    target += normal_lpdf(y_tt|Mu, sig);
    // Prior:
    Beta ~ normal(0, 10);
    mu0 ~ normal(0, 10);
    sigma0 ~ normal(0, 10);
    eta ~ normal(0, 10);
    to_vector(w) ~ normal(0, 1);
    sig ~ normal(0, 10);
    sig1 ~ normal(0, 10);
    sig2 ~ normal(0, 10);
    rho ~ normal(0, 10);
}

Using closed form:

data {
  int<lower=1> n; // Sample size
  int<lower=1> n_cens; // Right censored sample size
  int<lower=1> n_obs; // Failed unit sample size
  int<lower=1> N; // Total number of times considered (all times for all units)
  real<lower=0> tt[N]; //All times for all units
  real y_tt[N]; //All of the use-rates
  int w_ind[N]; //Indices for w
  real<lower=0> tt_fail[n_obs]; //Failure times for units that failed
  int w_ind_fail[n_obs]; //Indices for w for units that failed
  int w_ind_cens[n_cens];
  real<lower=0> tt_cens[n_cens];
}
parameters {
  real <lower = 0> mu0;
  real <lower = 0> sigma0;
  real <lower = 0> sig;
  real <lower = 0> sig1;
  real <lower = 0> sig2;
  real <lower = -1, upper = 1>rho;
  real eta;
  real Beta;
  matrix[2, n] w;
}
model {
  //cumulative exposure for failed units and for censored units
  vector[n_obs] u_obs;
  vector[n_cens] u_cens;
  //Defining covariance matrix, cholesky decomposition and mean and variance of mixed effects model
  matrix[2,2] Sigma;
  real Mu[N];
  vector[n_obs] Mu_DFD;
  matrix[2,2] L;
  matrix[2, n] w2;
  //Covariance matrix
  Sigma[1,1] = sig1^2;
  Sigma[1,2] = rho*sig1*sig2;
  Sigma[2,1] = rho*sig1*sig2;
  Sigma[2,2] = sig2^2;
  //Cholesky decomposition
  L = cholesky_decompose(Sigma);
  w2 = L*w;
  //Parameters used in mixed effects model
  for(i in 1:N){
    Mu[i] = eta + w2[1,w_ind[i]] + w2[2,w_ind[i]]*log(tt[i]);
  }
  //Calculating mu_DFD 
  for(i in 1:n_obs){
    Mu_DFD[i] = eta + w2[1,w_ind_fail[i]] + w2[2,w_ind_fail[i]]*log(tt_fail[i]);
    u_obs[i] = exp(Beta*(eta + w2[1,w_ind_fail[i]]))/(Beta*w2[2,w_ind_fail[i]] + 1)*(tt_fail[i]^(Beta*w2[2,w_ind_fail[i]] + 1) - 1);
  }
  for(i in 1:n_cens){
    u_cens[i] = exp(Beta*(eta + w2[1,w_ind_cens[i]]))/(Beta*w2[2,w_ind_cens[i]] + 1)*(tt_cens[i]^(Beta*w2[2,w_ind_cens[i]] + 1) - 1);
  }
  //Likelihood
    target += weibull_lpdf(u_obs| 1/sigma0, exp(mu0));
    target += Beta*Mu_DFD;
    target += weibull_lccdf(u_cens|1/sigma0, exp(mu0));
    target += normal_lpdf(y_tt|Mu, sig);
    // Prior:
    Beta ~ normal(0, 10);
    mu0 ~ normal(0, 10);
    sigma0 ~ normal(0, 10);
    eta ~ normal(0, 10);
    to_vector(w) ~ normal(0, 1);
    sig ~ normal(0, 10);
    sig1 ~ normal(0, 10);
    sig2 ~ normal(0, 10);
    rho ~ normal(0, 10);
}

Are you sure there should be the extra - 1 at the end of this:

exp(Beta*(eta + w2[1,w_ind_fail[i]]))/(Beta*w2[2,w_ind_fail[i]] + 1)*(tt_fail[i]^(Beta*w2[2,w_ind_fail[i]] + 1) - 1);

Also, if you’re only doing the integral from 0.01, that makes the closed form integral this:

e^{\beta [\eta + w_0]} \frac{1}{\beta w_1 + 1} t^{\beta w_1 + 1} \bigg\rvert ^{t = T}_{t = 0.01} \\ e^{\beta [\eta + w_0]} \frac{1}{\beta w_1 + 1} (T^{\beta w_1 + 1} - 0.01^{\beta w_1 + 1})

So there should be two terms there.

1 Like

It may be worth adding in the closed form equation to the model with the integrate_1d call (that runs) and printing out the closed form value vs. the integrate_1d output. They should be the same.

1 Like

Also to answer your previous post (A matrix of parameters in Stan Integral), Stan is evaluating the integral just like any other function.

Like if you had any function f(\beta, w_1), think of it just like that.

Don’t think of it doing any extra integral over \beta space or w_1 space.

1 Like

You’re right. It should be +1.

Nono, I think it should just be:

(tt_fail[i]^(Beta*w2[2,w_ind_fail[i]] + 1))

instead of:

(tt_fail[i]^(Beta*w2[2,w_ind_fail[i]] + 1) - 1)

Ah. I thought I had \beta w_1 -1 as the subscript of the integral for a second.

I will change the lower limit and think about the problem a little bit more.

I will also compare the values from the closed form equation to the model with the integrate1_d call.

I think I should also include the xc term for when x gets small. I will attempt this.

Edit: Changing the lower limit allows the code to run. I will be able to see if I get any divergent transitions, or anything else, when it finishes.

The code finished with less warning messages:

2: The largest R-hat is 1.24, indicating chains have not mixed.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#r-hat 
3: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#bulk-ess 
4: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#tail-ess 

I only ran the code for 500 iterations. These warnings do not seem as serious as the ones obtained earlier using integrate_1d. Perhaps running the chains for more iterations is the solution.

Maybe eventually, but the thing to do now is use fake data generated from this model or if you’re already using fake data use a smaller amount of fake data and figure out why your rhat isn’t going to 1.0. That indicates either multiple chains aren’t sampling the same distribution, or within each chain there are problems mixing.

Okay. Note, I ran the initial code in this post (very first post) and it worked, there were no warning messages and I recovered the parameters that I expected. However, the n_eff values were very low (I recall one of them being 69) from 1000 iterations.

Diagnostics plots showed that the samples (for some parameters) had very high autocorrelation for up to lags of 50. I had to run it for much longer and use thin.

I wonder if this is the case here.

1 Like