How to speed up my bivariate hierarchical normal model?

I have a bivariate hierarchical normel in Stan and it took like 8 hours for not sampling anything.
Basically, I model dynamics of anti-P1 and anti-P2 (two types of antibody) on the logscale.
For each antibody, they follow the same dynamics:
t <2 : mu = A0exp[-betat]
2 <= t <= h: mu= A0exp[-2t]exp[(t-2)gamma]
t>h: mu = A0
Hence: log(y_observed) ~ N[log(mu), sigma)

I have repeated measurements and at each time point, both anti-P1 and anti-P2 were measured. So my model is: (y1,y2) ~ MVN[(mu1,mu2), Sigma].

In attachment is my stan code and the data file.
Thank you for any tips to speed up my model.

P/s: I am new with Stan so sorry if the code is not well-organized as expected.
Matab4_Stan_Aug17_JOINTModelTillM15_LogScale_Remove0PRN.R (7.6 KB)

1 Like

And here is my data if you want to rerun my model.
data2land_TillM15_PRNPT_Aug17.csv (21.3 KB)
Thank you.

I copied the Stan code. Hopefully, that give you more eyeballs.

data {
  int<lower=1> N;                         // Total number of data points
  real<lower=0> time[N];                  // Time points
  vector[2]        y[N];                  // antiPT-antiPRN levels on log scale
  int<lower=1>     J;                     // Number of subjects
  int<lower=1, upper=J>    subj[N];       // Subject id
  int<lower=0, upper=1>    Land[N];       // Country indication
  int<lower=0, upper=1>    C2[N];         // Child indication

parameters {
  real<lower=0> beta1;                    // Basic parms for antiPT
  real<lower=0> gamma1;
  real<lower=4,upper=8> h;
  real<lower=0> alpha1;
  real<lower=0> A01;

  real<lower=0> beta2;                    // Basic parms for antiPT
  real<lower=0> gamma2;
  // real<lower=4,upper=8> h;             // Assume the same h
  real<lower=0> alpha2;
  real<lower=0> A02;
  vector[J] b1_raw;                       // To apply non-centered parameterization
  vector[J] bbeta1_raw;                   // Random parts for antiPT
  vector[J] bgamma1_raw;
  vector[J] balpha1_raw;

  vector[J] b2_raw;                       // To apply non-centered parameterization
  vector[J] bbeta2_raw;                   // Random parts for antiPRN
  vector[J] bgamma2_raw;
  vector[J] balpha2_raw;
  real<lower=0> sigma1_A0;                // Variance part for antiPT
  real<lower=0> sigma1_beta;
  real<lower=0> sigma1_gamma;
  real<lower=0> sigma1_alpha;
  //real<lower=0> sigma1;
  real<lower=0> sigma2_A0;                // Variance part for antiPRN
  real<lower=0> sigma2_beta;
  real<lower=0> sigma2_gamma;
  real<lower=0> sigma2_alpha;
  //real<lower=0> sigma2;

  real v1A0PT;   real v2A0PT;             // Covariate part for antiPT
  real v1betaPT; real v2betaPT;
  real<lower=-10,upper=10> v1gammaPT;real<lower=-10,upper=10> v2gammaPT;
  real<lower=-10,upper=10> v1alphaPT;real<lower=-10,upper=10> v2alphaPT;
  real<lower=-10,upper=10> v1h;    real<lower=-10,upper=10> v2h; // h: only v1h, v2h, same for PT and PRN

  real v1A0PRN;   real v2A0PRN;           // Covariate part for antiPRN
  real v1betaPRN; real v2betaPRN;
  real<lower=-10,upper=10> v1gammaPRN;real<lower=-10,upper=10> v2gammaPRN;
  real<lower=-10,upper=10> v1alphaPRN;real<lower=-10,upper=10> v2alphaPRN;

  //real<lower=0> omega;                   // Covariance term
  //cholesky_factor_corr[2] L_corr;
  //vector<lower=0,upper=1>[J] sigma;

  cholesky_factor_corr[2] L_corr;
  vector<lower=0>[2] sigma;


transformed parameters{
  // For antiPT
  vector[J] b1;  vector[J] balpha1; vector[J] bgamma1; vector[J] bbeta1; 
  real betapart1[N];
  real gammapart1[N];
  real alphapart1[N];
  real A0part1[N];
  real mu1[N];
  real diffPT[N]; real diff2PT[N];
  real hpart[N];   //hpart the same for both PT and PRN

  // For antiPRN
  vector[J] b2;  vector[J] balpha2; vector[J] bgamma2; vector[J] bbeta2; 
  real betapart2[N];
  real gammapart2[N];
  real alphapart2[N];
  real A0part2[N];
  real mu2[N];
  real diffPRN[N]; real diff2PRN[N];
  vector[2] mu[N]; // vector of the mean
  // Calculate for antiPT & antiPRN
  b1 = sigma1_A0*b1_raw;  bgamma1=sigma1_gamma*bgamma1_raw; bbeta1=sigma1_beta*bbeta1_raw;
  balpha1=sigma1_alpha*balpha1_raw;            // For antiPT

  b2 = sigma2_A0*b2_raw;  bgamma2=sigma2_gamma*bgamma2_raw; bbeta2=sigma2_beta*bbeta2_raw;
  balpha2=sigma2_alpha*balpha2_raw;            // For antiPRN

  for (i in 1:N) {
    A0part1[i]    = b1[subj[i]]                +v1A0PT*C2[i]    + v2A0PT*Land[i]  ; 
    betapart1[i]  = beta1   + bbeta1[subj[i]]  +v1betaPT*C2[i]  + v2betaPT*Land[i] ;
    gammapart1[i] = gamma1  + bgamma1[subj[i]] +v1gammaPT*C2[i] + v2gammaPT*Land[i];
    alphapart1[i] = alpha1  + balpha1[subj[i]] +v1alphaPT*C2[i] + v2alphaPT*Land[i]; // For antiPT
    hpart[i]      = h       + v1h*C2[i]        +v2h*Land[i]                    ;     // hpart the same for both

    A0part2[i]    = b2[subj[i]]                +v1A0PRN*C2[i]    + v2A0PRN*Land[i]  ; 
    betapart2[i]  = beta2   + bbeta2[subj[i]]  +v1betaPRN*C2[i]  + v2betaPRN*Land[i] ;
    gammapart2[i] = gamma2  + bgamma2[subj[i]] +v1gammaPRN*C2[i] + v2gammaPRN*Land[i];
    alphapart2[i] = alpha2  + balpha2[subj[i]] +v1alphaPRN*C2[i] + v2alphaPRN*Land[i]; 

    // The mean
    if ( time[i] <= 2 ) {
      mu1[i] = log(A01*exp(A0part1[i] -betapart1[i]*time[i])) ;
      mu2[i] = log(A02*exp(A0part2[i] -betapart2[i]*time[i])) ; }
    else if ( time[i] >2 && time[i] <=hpart[i] ) {
      mu1[i] = log(A01*exp(A0part1[i] -2*betapart1[i] + gammapart1[i]*(time[i]-2) ) ) ;
      mu2[i] = log(A02*exp(A0part2[i] -2*betapart2[i] + gammapart2[i]*(time[i]-2) ) ) ;}
    else if ( time[i] > hpart[i]) {
  mu1[i] = log(A01*exp(A0part1[i] - 2*betapart1[i] + gammapart1[i]*(hpart[i]-2) -alphapart1[i]*(time[i]-hpart[i])) );
  mu2[i] = log(A02*exp(A0part2[i] - 2*betapart2[i] + gammapart2[i]*(hpart[i]-2) -alphapart2[i]*(time[i]-hpart[i])) );}
  mu[i,1] = mu1[i];
  mu[i,2] = mu2[i];

model {
  // Priors for antiPT
  b1_raw ~ normal(0, 1);                    //subj random effects - As we apply non-centered parameterization
  bbeta1_raw ~ normal(0, 1);
  bgamma1_raw~ normal(0, 1);
  balpha1_raw~ normal(0, 1);

  // Priors for antiPRN
  b2_raw ~ normal(0, 1);                    //subj random effects - As we apply non-centered parameterization
  bbeta2_raw ~ normal(0, 1);
  bgamma2_raw~ normal(0, 1);
  balpha2_raw~ normal(0, 1);

  sigma ~ cauchy(0, 5);
  L_corr ~ lkj_corr_cholesky(1);
  // Likelihood
  for (i in 1:N){
      for (k in 1:2) {
         target += multi_normal_cholesky_lpdf(y[N] | mu[k], diag_pre_multiply(sigma, L_corr) );
      } ;

To figure out what is going wrong it could be a good idea to start from a simpler model. For instance, a non-hierarchical model and see whether you can make that work. I also typically debug with iter=200. Stan is good at finding the typical set if your model is well specified.

In no particular order, I noticed the following.

  • mu1[i] = log(A01*exp(A0part1[i] -betapart1[i]*time[i])) can be rewritten as mu1[i] = log(A01) + exp(A0part1[i] -betapart1[i]*time[i])) which is going to be numerical stabler and requires less calculations. You could even work directly with log_A01 as a parameter if you correctly adjust your prior on A01.

  • You don’t need mu1 and mu2, you can directly assign to mu[i,j].

  • I think you could speed up things if you work with matrices and vectors for your parameters. Stan treats matrices in a special way under the hood to calculate the derivatives.

  • The following has some mistakes

for (i in 1:N){
     for (k in 1:2) {
        target += multi_normal_cholesky_lpdf(y[N] | mu[k], diag_pre_multiply(sigma, L_corr) );
     } ;

It should be

for (i in 1:N){
   target += multi_normal_cholesky_lpdf(y[i] | mu[i], diag_pre_multiply(sigma, L_corr) )

I think this is why you did not get any results.

You can speed this up by

L_cov matrix[2,2];
L_cov = diag_pre_multiply(sigma, L_corr); //only calculate this once
target += multi_normal_cholesky_lpdf(y | mu, L_cov);

Thank you Stijn for your feedback.
I tried all your tips and now my model could run: It took about 40 minutes to run 200 iterations (warmup = 100).
But I am happy at least it could run and the initial results seemed to be reasonable.
P/s: I did the same analysis in Stan for analyzing anti-PT and anti-PRN separately and it run quite fast (15 minutes for 3 chains of 2000 iterations each).
I think there should be something to be improved so that the model could be run faster.
I try first some reparameterization and vectorize my model then I come back later.


I just noticed a typo in one of my suggestions. I forgot to delete the exp. This should be better.

mu1[i] = log(A01) + (A0part1[i] -betapart1[i]*time[i])

Hi Stijn,
No worries, I already deleted the exp.
Thank you a lot. Now the model run quite fast even with the iterations = 1000 (warmup=500) and 3 chains. It took only 30 minutes.
(well if I don’t include: control=list(adapt_delta=0.9, stepsize = 0.01, max_treedepth =15) ).


I have one remaining question.
The above model worked very nicely in Stan. Now, I have censoring obseration in my data and want to use Tobit regression. Since I have two responses (y1,y2) organized into a data matrix, I did the same thing for the censoring indicator in the data block.
So it is:
data { int<lower=1> N; // Total number of data points real<lower=0> time[N]; // Time points vector[2] y[N]; // antiPT-antiPRN levels on log scale vector<lower=0,upper=1>[2] cens[N]; // Censoring indicator for antiPT-antiPRN int<lower=1> J; // Number of subjects int<lower=1, upper=J> subj[N]; // Subject id int<lower=0, upper=1> C2[N]; // Child indication }

And in the likelihood, I only changed it into:
target += (1-cens)*multi_normal_cholesky_lpdf(y | mu, L_cov) + cens*multi_normal_cholesky_lcdf(y[ | mu, L_cov);

However, Stan complained:
'No matches for:

int - vector[]
Expression is ill formed’.
I guessed it is becaused I put the censoring indicator (0,1) into a vector. How do I better program this?

Thank you.

You can use rep_vector on the int to make it look like the right size.

Hi Sakrejda,
Thank you.
I just realized that Stan has no Cumulative Distribution Function for multivariate normal distribution. Is that right?


That’s right, a general multivariate normal CDF is complicated, there was a thread recently on here where Ben Goodrich threw out some of the reasons for that.

IIRC there’s not generic calculation for it.

Thank you. Glad to know and hope it will be released soon.

We aren’t planning to add a multivariate CDF any time soon—nobody knows how to calculate it. But we may be adding a bivariate case. Until then, you can just implement it as a Stan function. The issue to track it is here:

and here’s @bgoodri’s comment with Stan code:

Those links are super-confusing given that they just summarize the issue, not what’s being linked, so I’ll add the actual code from @bgoodri:

 real binormal_cdf(real z1, real z2, real rho) {
    if (z1 != 0 || z2 != 0) {
      real denom = fabs(rho) < 1.0 ? sqrt((1 + rho) * (1 - rho)) : not_a_number();
      real a1 = (z2 / z1 - rho) / denom;
      real a2 = (z1 / z2 - rho) / denom;
      real product = z1 * z2;
      real delta = product < 0 || (product == 0 && (z1 + z2) < 0);
      return 0.5 * (Phi(z1) + Phi(z2) - delta) - owens_t(z1, a1) - owens_t(z2, a2);
    return 0.25 + asin(rho) / (2 * pi());

Thank you Bob for the link and the clarification.
A bivariate normal CDF would also be of interest as well.

You can use the above. Just drop it into the functions block and you can use it as if it were built in. It won’t be as fast as if we had analytic derivavites and some of those functions like owens_t are rather compute intensive to evaluate and differentiate.

It’s on the linear scale rather than log scale, so like the rest of our CDFs, it’s unlikely to fare well in the tails. I’d also have written it to put the edge case of z1 == 0 && z2 == 0 first (just good coding style as it leads to less nesting and makes it clear what the “usual” branch of execution is.