Just for concreteness, I’m pasting in one of the models from the linked repo:
functions {
real hnormal_lpdf(vector y, real sigma) {
int N = rows(y);
vector[N] prob;
real lprob;
for (i in 1:N) {
prob[i] = log(2*sigma)-log(pi())-(y[i]^2*sigma^2)/pi();
}
lprob = sum(prob);
return lprob;
}
real nhnormal_pdf(real y, row_vector x, real alpha, vector beta, real sigma, real lambda) {
real prob;
real sigs = sqrt(sigma^2 + lambda);
real delt = sqrt(lambda)/sigma;
real epsi = y - alpha - x * beta;
prob = (0.5) * log(2 / pi()) - log(sigs) + normal_lcdf(epsi*delt/sigs|0,1) - (epsi^2)/(2*sigs^2);
return prob;
}
}
data {
int<lower=0> N; // Number of observations
int<lower=0> K; // Number of factors of cost function
matrix[N, K] x; // Matrix of factors of cost function
vector[N] y; // Vector of costs
}
parameters {
real alpha; // Intercept
vector[K] beta; // Coefficients of the cost function
real<lower=0> sigma; // Error standard deviation
real<lower=0> lambda; // Parameter of half-normal inefficiency distribution
vector<lower=0>[N] u; // Inefficiency terms
}
model {
alpha ~ normal(0, 100);
beta ~ normal(0, 100);
sigma ~ inv_gamma(0.001, 0.001);
lambda ~ gamma(1, 1/37.5);
u ~ hnormal(lambda);
y ~ normal(alpha + x * beta + u, sigma);
}
generated quantities {
vector[N] eff;
vector[N] log_lik;
eff = exp(-u);
for (i in 1:N) {
log_lik[i] = nhnormal_pdf(y[i], x[i], alpha, beta, sigma, lambda);
}
}
First, we don’t recommend those diffuse gamma priors—there’s a discussion in the Stan manual in the chapter on regression. We generally recommend at least weakly informative priors that indicate the scale of the parameters.
You can vectorize for efficiency and you don’t need all those intermediates. This,
real hnormal_lpdf(vector y, real sigma) {
int N = rows(y);
vector[N] prob;
real lprob;
for (i in 1:N) {
prob[i] = log(2*sigma)-log(pi())-(y[i]^2*sigma^2)/pi();
}
lprob = sum(prob);
return lprob;
}
can be reduced to a much more efficient one liner
real hnormal_lpdf(vector y, real sigma) {
return rows(y) * (log(2 * sigma) - log(pi()))
- dot_self(y) / pi() * square(sigma);
}
with only N + 3 multiplications, one division, and two subtractions. I took the additional measure of guessing that y
might be data, so that the dot-self and division by pi()
are together so as to minimize operations involving parameters (which is where the real cost is because of the need to store the expressions and compute derivatives).