Before running anything for two weeks, you should verify that it works on smaller simulate data. Then I’d start running with fewer than 2000 iterations—you may not need that many.
I’m guessing the bottleneck is here:
for(i in 1:n)
prob[i] = skew_normal_cdf(beta0 + X[i] * beta, 0, 1, alfa);
because most of our CDFs are not that efficient. I’d also be worried about rounding to 0 or 1 here. This also does very inefficient matrix multiplication. You’re better off coding this as:
{
vector[n] mu = beta0 + X * beta;
for (i in 1:n)
prob[i] = skew_normal_cdf(mu[i], ...);
}
This may make a big difference because of the sizes of X and beta.
If X is very sparse (say less than 10% filled), you want to use our csr multiply function.
What is the posterior scale of the parameters? One of the best things you can do is make sure they’re all roughly standard scale (ranging between -3 and 3, or something like normal(0, 1)). I’m not sure if you’re just trying to include vague priors, but if your parameters are scale 100, it’ll help to rescale.
You’ll want a non-centered parameterization here:
beta ~ normal(0, tau * sqrt(lambda));
which looks like:
parameters {
vector[k] beta_std;
...
model {
vector[k] beta = tau * sqrt(lambda) * beta_std;
beta_std ~ normal(0, 1);
...
I didn’t save the actual beta becaus eyou have 17K of them.
With the number of data points so much smaller than the number of predictors, you’ll need strong priors to identify the model.
The biggest problem I see, though is that you have the tau * sqrt(lambda), which provides each beta with its own error scale. This is not going to be well identified as you can divide tau and multiply lambda to get the same result, and it makes a nasty banana-shaped posterior. You’re leaing on the priors on lambda and tau to identify the model, but that’s often not a good idea.
I’d try to back off and just use tau and put a stronger prior than cauchy(0, 1) on it, and see how that fits.