Help with hierarchical model


I am trying to fit the following hierarchical model with assymetric link to some text categorization data in Stan. forumstanmodel.txt (817 Bytes)

I have n= 5485 and k=17388 predictors.
Current runtime is two weeks. Any advice, in order to speed this would be greatly appreciated.


Before running anything for two weeks, you should verify that it works on smaller simulate data. Then I’d start running with fewer than 2000 iterations—you may not need that many.

I’m guessing the bottleneck is here:

for(i in 1:n)
  prob[i] = skew_normal_cdf(beta0 + X[i] * beta, 0, 1, alfa);

because most of our CDFs are not that efficient. I’d also be worried about rounding to 0 or 1 here. This also does very inefficient matrix multiplication. You’re better off coding this as:

  vector[n] mu = beta0 + X * beta;
  for (i in 1:n)
    prob[i] = skew_normal_cdf(mu[i], ...);

This may make a big difference because of the sizes of X and beta.

If X is very sparse (say less than 10% filled), you want to use our csr multiply function.

What is the posterior scale of the parameters? One of the best things you can do is make sure they’re all roughly standard scale (ranging between -3 and 3, or something like normal(0, 1)). I’m not sure if you’re just trying to include vague priors, but if your parameters are scale 100, it’ll help to rescale.

You’ll want a non-centered parameterization here:

beta ~ normal(0, tau * sqrt(lambda));

which looks like:

parameters {
  vector[k] beta_std;
model {
  vector[k] beta =  tau * sqrt(lambda) * beta_std;
  beta_std ~ normal(0, 1);

I didn’t save the actual beta becaus eyou have 17K of them.

With the number of data points so much smaller than the number of predictors, you’ll need strong priors to identify the model.

The biggest problem I see, though is that you have the tau * sqrt(lambda), which provides each beta with its own error scale. This is not going to be well identified as you can divide tau and multiply lambda to get the same result, and it makes a nasty banana-shaped posterior. You’re leaing on the priors on lambda and tau to identify the model, but that’s often not a good idea.

I’d try to back off and just use tau and put a stronger prior than cauchy(0, 1) on it, and see how that fits.


Thaks a lot Bob for your comments.
Exactly, before I verified that the model works well with smaller data and after that I started to test with databases of increasing size and in the same way with the number of iterations arriving to see suitable results with 1150.
I will test your suggestions.