Before running anything for two weeks, you should verify that it works on smaller simulate data. Then I’d start running with fewer than 2000 iterations—you may not need that many.
I’m guessing the bottleneck is here:
for(i in 1:n)
prob[i] = skew_normal_cdf(beta0 + X[i] * beta, 0, 1, alfa);
because most of our CDFs are not that efficient. I’d also be worried about rounding to 0 or 1 here. This also does very inefficient matrix multiplication. You’re better off coding this as:
vector[n] mu = beta0 + X * beta;
for (i in 1:n)
prob[i] = skew_normal_cdf(mu[i], ...);
This may make a big difference because of the sizes of
X is very sparse (say less than 10% filled), you want to use our
csr multiply function.
What is the posterior scale of the parameters? One of the best things you can do is make sure they’re all roughly standard scale (ranging between -3 and 3, or something like normal(0, 1)). I’m not sure if you’re just trying to include vague priors, but if your parameters are scale 100, it’ll help to rescale.
You’ll want a non-centered parameterization here:
beta ~ normal(0, tau * sqrt(lambda));
which looks like:
vector[k] beta = tau * sqrt(lambda) * beta_std;
beta_std ~ normal(0, 1);
I didn’t save the actual
beta becaus eyou have 17K of them.
With the number of data points so much smaller than the number of predictors, you’ll need strong priors to identify the model.
The biggest problem I see, though is that you have the
tau * sqrt(lambda), which provides each
beta with its own error scale. This is not going to be well identified as you can divide
tau and multiply
lambda to get the same result, and it makes a nasty banana-shaped posterior. You’re leaing on the priors on
tau to identify the model, but that’s often not a good idea.
I’d try to back off and just use
tau and put a stronger prior than
cauchy(0, 1) on it, and see how that fits.