Before running anything for two weeks, you should verify that it works on smaller simulate data. Then I’d start running with fewer than 2000 iterations—you may not need that many.
I’m guessing the bottleneck is here:
for(i in 1:n)
prob[i] = skew_normal_cdf(beta0 + X[i] * beta, 0, 1, alfa);
because most of our CDFs are not that efficient. I’d also be worried about rounding to 0 or 1 here. This also does very inefficient matrix multiplication. You’re better off coding this as:
{
vector[n] mu = beta0 + X * beta;
for (i in 1:n)
prob[i] = skew_normal_cdf(mu[i], ...);
}
This may make a big difference because of the sizes of X
and beta
.
If X
is very sparse (say less than 10% filled), you want to use our csr
multiply function.
What is the posterior scale of the parameters? One of the best things you can do is make sure they’re all roughly standard scale (ranging between -3 and 3, or something like normal(0, 1)). I’m not sure if you’re just trying to include vague priors, but if your parameters are scale 100, it’ll help to rescale.
You’ll want a non-centered parameterization here:
beta ~ normal(0, tau * sqrt(lambda));
which looks like:
parameters {
vector[k] beta_std;
...
model {
vector[k] beta = tau * sqrt(lambda) * beta_std;
beta_std ~ normal(0, 1);
...
I didn’t save the actual beta
becaus eyou have 17K of them.
With the number of data points so much smaller than the number of predictors, you’ll need strong priors to identify the model.
The biggest problem I see, though is that you have the tau * sqrt(lambda)
, which provides each beta
with its own error scale. This is not going to be well identified as you can divide tau
and multiply lambda
to get the same result, and it makes a nasty banana-shaped posterior. You’re leaing on the priors on lambda
and tau
to identify the model, but that’s often not a good idea.
I’d try to back off and just use tau
and put a stronger prior than cauchy(0, 1)
on it, and see how that fits.