How to speed up my bivariate hierarchical normal model?

Thao_Tran · August 25, 2017, 1:36am

Hi,
I have a bivariate hierarchical normel in Stan and it took like 8 hours for not sampling anything.
Basically, I model dynamics of anti-P1 and anti-P2 (two types of antibody) on the logscale.
For each antibody, they follow the same dynamics:
t <2 : mu = A0exp[-betat]
2 <= t <= h: mu= A0exp[-2t]exp[(t-2)gamma]
t>h: mu = A0exp[-2t]*exp[(h-2)*gamma]exp[-alpha(t-h)]
Hence: log(y_observed) ~ N[log(mu), sigma)

I have repeated measurements and at each time point, both anti-P1 and anti-P2 were measured. So my model is: (y1,y2) ~ MVN[(mu1,mu2), Sigma].

In attachment is my stan code and the data file.
Thank you for any tips to speed up my model.

P/s: I am new with Stan so sorry if the code is not well-organized as expected.
Matab4_Stan_Aug17_JOINTModelTillM15_LogScale_Remove0PRN.R (7.6 KB)

Thao_Tran · August 25, 2017, 1:37am

And here is my data if you want to rerun my model.
data2land_TillM15_PRNPT_Aug17.csv (21.3 KB)
Thank you.

stijn · August 25, 2017, 6:14am

I copied the Stan code. Hopefully, that give you more eyeballs.

data {
  int<lower=1> N;                         // Total number of data points
  real<lower=0> time[N];                  // Time points
  vector[2]        y[N];                  // antiPT-antiPRN levels on log scale
  int<lower=1>     J;                     // Number of subjects
  int<lower=1, upper=J>    subj[N];       // Subject id
  int<lower=0, upper=1>    Land[N];       // Country indication
  int<lower=0, upper=1>    C2[N];         // Child indication
}

parameters {
  real<lower=0> beta1;                    // Basic parms for antiPT
  real<lower=0> gamma1;
  real<lower=4,upper=8> h;
  real<lower=0> alpha1;
  real<lower=0> A01;

  real<lower=0> beta2;                    // Basic parms for antiPT
  real<lower=0> gamma2;
  // real<lower=4,upper=8> h;             // Assume the same h
  real<lower=0> alpha2;
  real<lower=0> A02;
  
  vector[J] b1_raw;                       // To apply non-centered parameterization
  vector[J] bbeta1_raw;                   // Random parts for antiPT
  vector[J] bgamma1_raw;
  vector[J] balpha1_raw;

  vector[J] b2_raw;                       // To apply non-centered parameterization
  vector[J] bbeta2_raw;                   // Random parts for antiPRN
  vector[J] bgamma2_raw;
  vector[J] balpha2_raw;
  
  real<lower=0> sigma1_A0;                // Variance part for antiPT
  real<lower=0> sigma1_beta;
  real<lower=0> sigma1_gamma;
  real<lower=0> sigma1_alpha;
  //real<lower=0> sigma1;
  
  real<lower=0> sigma2_A0;                // Variance part for antiPRN
  real<lower=0> sigma2_beta;
  real<lower=0> sigma2_gamma;
  real<lower=0> sigma2_alpha;
  //real<lower=0> sigma2;

  real v1A0PT;   real v2A0PT;             // Covariate part for antiPT
  real v1betaPT; real v2betaPT;
  real<lower=-10,upper=10> v1gammaPT;real<lower=-10,upper=10> v2gammaPT;
  real<lower=-10,upper=10> v1alphaPT;real<lower=-10,upper=10> v2alphaPT;
  real<lower=-10,upper=10> v1h;    real<lower=-10,upper=10> v2h; // h: only v1h, v2h, same for PT and PRN

  real v1A0PRN;   real v2A0PRN;           // Covariate part for antiPRN
  real v1betaPRN; real v2betaPRN;
  real<lower=-10,upper=10> v1gammaPRN;real<lower=-10,upper=10> v2gammaPRN;
  real<lower=-10,upper=10> v1alphaPRN;real<lower=-10,upper=10> v2alphaPRN;

  //real<lower=0> omega;                   // Covariance term
  //cholesky_factor_corr[2] L_corr;
  //vector<lower=0,upper=1>[J] sigma;

  cholesky_factor_corr[2] L_corr;
  vector<lower=0>[2] sigma;

}


transformed parameters{
  // For antiPT
  vector[J] b1;  vector[J] balpha1; vector[J] bgamma1; vector[J] bbeta1; 
  real betapart1[N];
  real gammapart1[N];
  real alphapart1[N];
  real A0part1[N];
  real mu1[N];
  real diffPT[N]; real diff2PT[N];
  real hpart[N];   //hpart the same for both PT and PRN

  // For antiPRN
  vector[J] b2;  vector[J] balpha2; vector[J] bgamma2; vector[J] bbeta2; 
  real betapart2[N];
  real gammapart2[N];
  real alphapart2[N];
  real A0part2[N];
  real mu2[N];
  real diffPRN[N]; real diff2PRN[N];
   
  vector[2] mu[N]; // vector of the mean
  
  // Calculate for antiPT & antiPRN
  b1 = sigma1_A0*b1_raw;  bgamma1=sigma1_gamma*bgamma1_raw; bbeta1=sigma1_beta*bbeta1_raw;
  balpha1=sigma1_alpha*balpha1_raw;            // For antiPT

  b2 = sigma2_A0*b2_raw;  bgamma2=sigma2_gamma*bgamma2_raw; bbeta2=sigma2_beta*bbeta2_raw;
  balpha2=sigma2_alpha*balpha2_raw;            // For antiPRN

  for (i in 1:N) {
    A0part1[i]    = b1[subj[i]]                +v1A0PT*C2[i]    + v2A0PT*Land[i]  ; 
    betapart1[i]  = beta1   + bbeta1[subj[i]]  +v1betaPT*C2[i]  + v2betaPT*Land[i] ;
    gammapart1[i] = gamma1  + bgamma1[subj[i]] +v1gammaPT*C2[i] + v2gammaPT*Land[i];
    alphapart1[i] = alpha1  + balpha1[subj[i]] +v1alphaPT*C2[i] + v2alphaPT*Land[i]; // For antiPT
    hpart[i]      = h       + v1h*C2[i]        +v2h*Land[i]                    ;     // hpart the same for both

    A0part2[i]    = b2[subj[i]]                +v1A0PRN*C2[i]    + v2A0PRN*Land[i]  ; 
    betapart2[i]  = beta2   + bbeta2[subj[i]]  +v1betaPRN*C2[i]  + v2betaPRN*Land[i] ;
    gammapart2[i] = gamma2  + bgamma2[subj[i]] +v1gammaPRN*C2[i] + v2gammaPRN*Land[i];
    alphapart2[i] = alpha2  + balpha2[subj[i]] +v1alphaPRN*C2[i] + v2alphaPRN*Land[i]; 

    // The mean
    if ( time[i] <= 2 ) {
      mu1[i] = log(A01*exp(A0part1[i] -betapart1[i]*time[i])) ;
      mu2[i] = log(A02*exp(A0part2[i] -betapart2[i]*time[i])) ; }
    else if ( time[i] >2 && time[i] <=hpart[i] ) {
      mu1[i] = log(A01*exp(A0part1[i] -2*betapart1[i] + gammapart1[i]*(time[i]-2) ) ) ;
      mu2[i] = log(A02*exp(A0part2[i] -2*betapart2[i] + gammapart2[i]*(time[i]-2) ) ) ;}
    else if ( time[i] > hpart[i]) {
  mu1[i] = log(A01*exp(A0part1[i] - 2*betapart1[i] + gammapart1[i]*(hpart[i]-2) -alphapart1[i]*(time[i]-hpart[i])) );
  mu2[i] = log(A02*exp(A0part2[i] - 2*betapart2[i] + gammapart2[i]*(hpart[i]-2) -alphapart2[i]*(time[i]-hpart[i])) );}
  mu[i,1] = mu1[i];
  mu[i,2] = mu2[i];
  }
}


model {
  // Priors for antiPT
  b1_raw ~ normal(0, 1);                    //subj random effects - As we apply non-centered parameterization
  bbeta1_raw ~ normal(0, 1);
  bgamma1_raw~ normal(0, 1);
  balpha1_raw~ normal(0, 1);

  // Priors for antiPRN
  b2_raw ~ normal(0, 1);                    //subj random effects - As we apply non-centered parameterization
  bbeta2_raw ~ normal(0, 1);
  bgamma2_raw~ normal(0, 1);
  balpha2_raw~ normal(0, 1);

     
  sigma ~ cauchy(0, 5);
  L_corr ~ lkj_corr_cholesky(1);
  // Likelihood
  for (i in 1:N){
      for (k in 1:2) {
         target += multi_normal_cholesky_lpdf(y[N] | mu[k], diag_pre_multiply(sigma, L_corr) );
      } ;
    }
}

To figure out what is going wrong it could be a good idea to start from a simpler model. For instance, a non-hierarchical model and see whether you can make that work. I also typically debug with iter=200. Stan is good at finding the typical set if your model is well specified.

In no particular order, I noticed the following.

mu1[i] = log(A01*exp(A0part1[i] -betapart1[i]*time[i])) can be rewritten as mu1[i] = log(A01) + exp(A0part1[i] -betapart1[i]*time[i])) which is going to be numerical stabler and requires less calculations. You could even work directly with log_A01 as a parameter if you correctly adjust your prior on A01.
You don’t need mu1 and mu2, you can directly assign to mu[i,j].
I think you could speed up things if you work with matrices and vectors for your parameters. Stan treats matrices in a special way under the hood to calculate the derivatives.
The following has some mistakes

for (i in 1:N){
     for (k in 1:2) {
        target += multi_normal_cholesky_lpdf(y[N] | mu[k], diag_pre_multiply(sigma, L_corr) );
     } ;
   };

It should be

for (i in 1:N){
   target += multi_normal_cholesky_lpdf(y[i] | mu[i], diag_pre_multiply(sigma, L_corr) )
};

I think this is why you did not get any results.

You can speed this up by

{
L_cov matrix[2,2];
L_cov = diag_pre_multiply(sigma, L_corr); //only calculate this once
target += multi_normal_cholesky_lpdf(y | mu, L_cov);
}

Thao_Tran · August 25, 2017, 12:32pm

Thank you Stijn for your feedback.
I tried all your tips and now my model could run: It took about 40 minutes to run 200 iterations (warmup = 100).
But I am happy at least it could run and the initial results seemed to be reasonable.
P/s: I did the same analysis in Stan for analyzing anti-PT and anti-PRN separately and it run quite fast (15 minutes for 3 chains of 2000 iterations each).
I think there should be something to be improved so that the model could be run faster.
I try first some reparameterization and vectorize my model then I come back later.

Thao.

stijn · August 25, 2017, 12:43pm

I just noticed a typo in one of my suggestions. I forgot to delete the exp. This should be better.

mu1[i] = log(A01) + (A0part1[i] -betapart1[i]*time[i])

Thao_Tran · August 25, 2017, 12:46pm

Hi Stijn,
No worries, I already deleted the exp.
Thank you a lot. Now the model run quite fast even with the iterations = 1000 (warmup=500) and 3 chains. It took only 30 minutes.
(well if I don’t include: control=list(adapt_delta=0.9, stepsize = 0.01, max_treedepth =15) ).

Thao.

Thao_Tran · August 28, 2017, 10:16am

Hi,
I have one remaining question.
The above model worked very nicely in Stan. Now, I have censoring obseration in my data and want to use Tobit regression. Since I have two responses (y1,y2) organized into a data matrix, I did the same thing for the censoring indicator in the data block.
So it is:
data { int<lower=1> N; // Total number of data points real<lower=0> time[N]; // Time points vector[2] y[N]; // antiPT-antiPRN levels on log scale vector<lower=0,upper=1>[2] cens[N]; // Censoring indicator for antiPT-antiPRN int<lower=1> J; // Number of subjects int<lower=1, upper=J> subj[N]; // Subject id int<lower=0, upper=1> C2[N]; // Child indication }

And in the likelihood, I only changed it into:
target += (1-cens)*multi_normal_cholesky_lpdf(y | mu, L_cov) + cens*multi_normal_cholesky_lcdf(y[ | mu, L_cov);

However, Stan complained:
'No matches for:

int - vector[]
Expression is ill formed’.
I guessed it is becaused I put the censoring indicator (0,1) into a vector. How do I better program this?

Thank you.

sakrejda · August 28, 2017, 10:55am

You can use rep_vector on the int to make it look like the right size.

Thao_Tran · August 28, 2017, 11:14am

Hi Sakrejda,
Thank you.
I just realized that Stan has no Cumulative Distribution Function for multivariate normal distribution. Is that right?

Thao.

sakrejda · August 28, 2017, 12:37pm

That’s right, a general multivariate normal CDF is complicated, there was a thread recently on here where Ben Goodrich threw out some of the reasons for that.

sakrejda · August 28, 2017, 12:37pm

IIRC there’s not generic calculation for it.

Thao_Tran · August 28, 2017, 12:39pm

Thank you. Glad to know and hope it will be released soon.

Bob_Carpenter · August 29, 2017, 11:10pm

We aren’t planning to add a multivariate CDF any time soon—nobody knows how to calculate it. But we may be adding a bivariate case. Until then, you can just implement it as a Stan function. The issue to track it is here:

github.com/stan-dev/stan

bivariate normal cdf

opened 03:41PM - 14 Jul 17 UTC

bob-carpenter

feature new function good first issue

#### Summary: Add a function to compute the bivariate normal cdf. #### D…escription: For size 2 vectors `y` and `mu` and 2 x 2 covariance matrix `Sigma`, compute ``` bivar_normal_cdf(y, mu, Sigma) = INT_{u < y} multinormal_pdf(u, mu, Sigma) d.u ``` where `u` is also a 2-vector and `u < y` is defined componentwise as `u[1] < y[1] && u[2] < y[2]`. There could also be a parameterization where covariance is given in terms of two scale parameters and a correlation parameter. ### Additional Info: This paper is often cited for an algorithm: * Boys, R.J., 1989. Algorithm AS R80: A remark on Algorithm AS 76: An integral useful in calculating noncentral t and bivariate normal probabilities. Journal of the Royal Statistical Society. Series C (Applied Statistics), 38(3), pp.580-582. https://www.jstor.org/stable/2347755 #### Current Version: v2.16.0

and here’s @bgoodri’s comment with Stan code:

github.com/stan-dev/stan

bivariate normal cdf

opened 03:41PM - 14 Jul 17 UTC

bob-carpenter

feature new function good first issue

#### Summary: Add a function to compute the bivariate normal cdf. #### D…escription: For size 2 vectors `y` and `mu` and 2 x 2 covariance matrix `Sigma`, compute ``` bivar_normal_cdf(y, mu, Sigma) = INT_{u < y} multinormal_pdf(u, mu, Sigma) d.u ``` where `u` is also a 2-vector and `u < y` is defined componentwise as `u[1] < y[1] && u[2] < y[2]`. There could also be a parameterization where covariance is given in terms of two scale parameters and a correlation parameter. ### Additional Info: This paper is often cited for an algorithm: * Boys, R.J., 1989. Algorithm AS R80: A remark on Algorithm AS 76: An integral useful in calculating noncentral t and bivariate normal probabilities. Journal of the Royal Statistical Society. Series C (Applied Statistics), 38(3), pp.580-582. https://www.jstor.org/stable/2347755 #### Current Version: v2.16.0

Those links are super-confusing given that they just summarize the issue, not what’s being linked, so I’ll add the actual code from @bgoodri:

 real binormal_cdf(real z1, real z2, real rho) {
    if (z1 != 0 || z2 != 0) {
      real denom = fabs(rho) < 1.0 ? sqrt((1 + rho) * (1 - rho)) : not_a_number();
      real a1 = (z2 / z1 - rho) / denom;
      real a2 = (z1 / z2 - rho) / denom;
      real product = z1 * z2;
      real delta = product < 0 || (product == 0 && (z1 + z2) < 0);
      return 0.5 * (Phi(z1) + Phi(z2) - delta) - owens_t(z1, a1) - owens_t(z2, a2);
    }
    return 0.25 + asin(rho) / (2 * pi());
  }

Thao_Tran · August 30, 2017, 7:52am

Thank you Bob for the link and the clarification.
A bivariate normal CDF would also be of interest as well.

Bob_Carpenter · August 30, 2017, 7:47pm

You can use the above. Just drop it into the functions block and you can use it as if it were built in. It won’t be as fast as if we had analytic derivavites and some of those functions like owens_t are rather compute intensive to evaluate and differentiate.

It’s on the linear scale rather than log scale, so like the rest of our CDFs, it’s unlikely to fare well in the tails. I’d also have written it to put the edge case of z1 == 0 && z2 == 0 first (just good coding style as it leads to less nesting and makes it clear what the “usual” branch of execution is.

Topic		Replies	Views
Speeding up multivariate normal model Modeling techniques	7	3164	September 9, 2017
Slow Stan Code for Hierarchical Multivariate Model Modeling	8	1207	July 24, 2018
How to speed up model? Modeling rstan , techniques , performance	7	809	September 9, 2020
How to speed up my Stan code? Modeling rstan , fitting-issues	8	744	June 14, 2021
Improving Hierarchical Model Efficiency Modeling rstan , fitting-issues , specification	7	534	February 14, 2023

How to speed up my bivariate hierarchical normal model?

Related topics