How to speed up model?

I am trying to fit a hierarchical model. The model runs fine but runs for days. Is there any way I can speed it up. I am new to Stan so any and all help is appreciated!

data {
  int<lower=0> N;//Number of observations
  int<lower=1> J;//Number of predictors with random slope
  int<lower=1> K;//Number of predictors with non-random slope
  int<lower=1> L;//Number of customers/groups
  int<lower=0,upper=1> y[N];//Binary response variable
  int<lower=1,upper=L> ll[N];//Number of observations in groups
  matrix[N,K] x1;
  matrix[N,J] x2;
}
parameters {
  vector[J] rbeta_mu; //mean of distribution of beta parameters
  vector<lower=0>[J] rbeta_sigma; //variance of distribution of beta parameters
  vector[J] beta_raw[L]; //group-specific parameters beta
  vector[K] beta;
}
transformed parameters {
  vector[J] rbeta[L];
  for (l in 1:L)
    rbeta[l] = rbeta_mu + rbeta_sigma .* beta_raw[l]; // coefficients on x
}
model {
  rbeta_mu ~ normal(0,5);
  rbeta_sigma ~ gamma(1,1);
  beta~normal(0,5);
  for (l in 1:L)
    beta_raw[l] ~ std_normal();

  for(n in 1:N)
    y[n]~bernoulli_logit(x1[n] * beta + x2[n] * rbeta[ll[n]]);
}

These are the gradient times:

Replace the for loops with vectorized statements. That should get you very far.

I have tried it:

model {
vector[N] p;
rbeta_mu ~ normal(0,5);
rbeta_sigma ~ inv_gamma(1,1);
beta~normal(0,5);
for (l in 1:L)
beta_raw[l] ~ std_normal();

p = x1 * beta + (x2 .* rbeta[ll]) * ones; // Multiplication by vector of ones as a row-wise summation of matrix
y~bernoulli_logit( p );
}

Didn’t make much of a difference.

The for loop over beta raw ?

Based on the suggestion of @andrjohns I used matrix operations

transformed parameters {
matrix[L,J] rbeta;
for (l in 1:L)
rbeta[l] = rbeta_mu + rbeta_sigma .* beta_raw[l]; // coefficients on x
}

Is this ok or is there another way?

Sorry, I read too quickly.

Honestly, just dump this model into the R package brms and use make_stanmodel and make_standata from brms. This will give you a decent Stan model in no time and you can tune it further.

Moreover, brms in on the verge to support (experimental) within-chain parallelisation should you really need a lot of power.

1 Like

Ok will give that a try. Thanks !

I am facing an unusual situation. 3 out 4 chains finished last chain running for more than 4 hours after other chains completed still not finished sampling. Is this normal?