I am trying to fit the following latent Gaussian process regression. Admittedly, there are a couple of complications - the response data I am working with is presence-absence data and there is some amount of data imbalance in the response data. I wrote the following model, pieced together from the Stan manual - it is a multivariate probit model with a latent Gaussian process on the beta coefficients with a squared exponential kernel.
functions
{
int sum2d(int[,] a) {
int s = 0;
for ( i in 1:size(a))
s += sum(a[i]);
return s;
}
}
data
{
int<lower=1> N; // Number of observations
int<lower=1> J; // Number of dependent variables
int<lower=0> K; // Number of independent variables
matrix[N,K] x; // independent variables
int<lower=0, upper = 1> y[N, J]; // dependent variables
matrix[J,J] L_dist; // beta distance matrix
int<lower=1> U; // upper limit for truncated inverse gamma
real<lower=0> a; // inverse gamma hyper parameter
real<lower=0> b; // inverse gamma hyper parameter
}
transformed data
{
int<lower=0> N_pos;
int<lower=1, upper=N> n_pos[sum2d(y)];
int<lower=1, upper=J> d_pos[size(n_pos)];
int<lower=0> N_neg;
int<lower=1, upper=N> n_neg[(N * J) - size(n_pos)];
int<lower=1, upper=J> d_neg[size(n_neg)];
N_pos = size(n_pos);
N_neg = size(n_neg);
{
int i;
int j;
i = 1;
j = 1;
for (n in 1:N) {
for (d in 1:J) {
if (y[n, d] == 1){
n_pos[i] = n;
d_pos[i] = d;
i += 1;
} else {
n_neg[j] = n;
d_neg[j] = d;
j += 1;
}
}
}
}
}
parameters
{
matrix[K, J] beta;
vector<lower=0>[N_pos] z_pos;
vector<upper=0>[N_neg] z_neg;
vector<lower=0>[J] y_devs;
vector<lower=0>[J] st_devs;
real<lower=0> rho;
real<lower=0> alpha;
vector[J] beta_mu; // beta centers
}
transformed parameters
{
vector[J] z[N];
for (n in 1:N_pos)
z[n_pos[n], d_pos[n]] = z_pos[n];
for (n in 1:N_neg)
z[n_neg[n], d_neg[n]] = z_neg[n];
}
model
{
matrix[J,J] L_K;
matrix[J,J] cov;
matrix[N,J] xbeta = x * beta;
for (i in 1:(J-1)) {
L_K[i,i] = 1 + 0.1; // for numerical stability
for(j in (i + 1):J){
L_K[i,j] = alpha^2* exp(-0.5 * (L_dist[i, j])^2/rho^2);
L_K[j,i] = L_K[i,j];
}
}
for( n in 1:J)
L_K[n, n] = 1 + 0.0001;
cov = cholesky_decompose(L_K);
rho ~ inv_gamma(a, 0.5) T[,U];
alpha ~ normal(0,1);
for (k in 1:K)
beta[k] ~ multi_normal_cholesky(beta_mu, diag_pre_multiply(st_devs, cov));
st_devs ~ cauchy(0, 2.5);
y_devs ~ cauchy(0, 2.5);
beta_mu ~ normal(0, 1);
for (n in 1:J)
z[n] ~ normal(xbeta[n], y_devs);
}
More on the data - the dimensions of the response data vary between [1000, 4] to [10,000, 14]. I can consolidate the whole response data instead of running it in chunks (as I am currently) and it would be on the order of [20,000,000, 265]. (The reason I am running it in chunks is because it takes a very long time to run e.g. up to 3 days on a high-performance cluster using 8 CPUs and up to 100GB of memory. Ideally, it could be run in one large model but that is the secondary concern). My primary issue is the chains don’t converge and I regularly receive R-hats between 1.01-2, ESS tail, and bulk < 100. There appears to be very high autocorrelation and the chains get stuck and don’t explore the entire parameter space. Here are my MCMC parameters -
stan_specs:
iter: 100000 # number of iterations to run stan model for
warmup: 40000 # burnin
chains: 4 # number of chains for stan model
thin: 100 # thinning - this should be less than number of iterations
delta: 0.9
stepsize: 1 # step size - raising might help stuck chains overcome their block
stepsize_jitter: 0
cores: 1 # number of cores to run on (most efficient if cores = chains), keep 1 for hpc
Any help would be much appreciated! Thank you in advance