Dear all
I am new to stan and trying to fit the following model for georeferenced data: Y_i \sim Bin(n_i, p_i) with link logit(p_i) = \alpha + X\beta + w_i, where w_1, \ldots, w_n \sim MVN(0, \Sigma) and \Sigma = \sigma_s^2 \exp(-\phi \cdot d_{ij}) if i \neq j and \Sigma = \sigma_{ns}^2 + \sigma_s^2 else. This would give me the following model:
data {
int<lower=0> N;
int<lower=0> K;
matrix[N, K] x;
int<lower = 0> y[N];
int<lower = 0> n[N];
matrix[N, N] dist;
}
parameters {
real alpha;
vector[K] beta;
vector[N] w;
real<lower = 0> sigma_ns2;
real<lower = 0> sigma_s2;
real<lower = 0> phi;
}
transformed parameters{
vector[N] mu = rep_vector(0, N);
}
model {
matrix[N,N] Sigma;
matrix[N,N] L;
for(i in 1:(N-1)){
for(j in (i+1):N){
Sigma[i,j] = sigma_s2*exp((-1)*phi*dist[i,j]);
Sigma[j,i] = Sigma[i,j];
}
}
for(i in 1:N) Sigma[i,i] = sigma_ns2 + sigma_s2;
L = cholesky_decompose(Sigma);
alpha ~ normal(0, 100);
for(i in 1:K){
beta[i] ~ normal(0, 100);
}
sigma_ns2 ~ inv_gamma(0.01, 0.01);
sigma_s2 ~ inv_gamma(0.01, 0.01);
phi ~ uniform(0.005, 0.1);
w ~ multi_normal_cholesky(mu, L);
y ~ binomial_logit(n, alpha + x * beta + w);
}
Now in the stan users guide (1.2 The QR Reparameterization | Stan User’s Guide) it says one should use QR factorization for regressions. Does this also apply here? I tried to modify my model to
data {
int<lower=0> N;
int<lower=0> K;
matrix[N, K] x;
int<lower = 0> y[N];
int<lower = 0> n[N];
matrix[N, N] dist;
}
parameters {
real alpha;
vector[K] theta;
vector[N] w;
real<lower = 0> sigma_ns2;
real<lower = 0> sigma_s2;
real<lower = 0> phi;
}
transformed parameters{
vector[N] mu = rep_vector(0, N);
matrix[N, K] Q_ast;
matrix[K, K] R_ast;
matrix[K, K] R_ast_inverse;
// thin and scale the QR decomposition
Q_ast = qr_Q(x)[, 1:K] * sqrt(N - 1);
R_ast = qr_R(x)[1:K, ] / sqrt(N - 1);
R_ast_inverse = inverse(R_ast);
}
model {
matrix[N,N] Sigma; // maybe changed to cov_matrix[N]
matrix[N,N] L; //maybe changed to cholesky_factor_cov[N]
for(i in 1:(N-1)){
for(j in (i+1):N){
Sigma[i,j] = sigma_s2*exp((-1)*phi*dist[i,j]);
Sigma[j,i] = Sigma[i,j];
}
}
for(i in 1:N) Sigma[i,i] = sigma_ns2 + sigma_s2;
L = cholesky_decompose(Sigma);
alpha ~ normal(0, 100);
for(i in 1:K){
theta[i] ~ normal(0, 100);
}
sigma_ns2 ~ inv_gamma(0.01, 0.01);
sigma_s2 ~ inv_gamma(0.01, 0.01);
phi ~ uniform(0.005, 0.1);
w ~ multi_normal_cholesky(mu, L);
y ~ binomial_logit(n, alpha + Q_ast * theta + w);
}
generated quantities {
vector[K] beta;
beta = R_ast_inverse * theta; // coefficients on x
}
When I run both models with low numbers of iterations, the QR model takes a longer time to run, the effective sample size is hard to compare because I run the model with low number of iterations so far because it takes a rather long time to run anyways. My questions are the following:
- Is it always appropriate to use the QR model as it says in the documentation?
- Is there a generic covariance function for my model? (which would probably speed up the estimation)
- Are there other obvious optimization steps that I miss?
- Does it improve anything if I use
cov_matrix
instead ofmatrix
?
Thank you in advance for any help. Also, if you may know examples of exactly this case, I would be greatful.
Dan