I am building a model Y=X\beta + \epsilon, where both X and Y are matrices. I want to model the correlation between the Y.
As you can see, I have some pretty big parameter matrices, so I want to declare them in the model block so that they can be discarded. But I got error, saying “std_normal_lpdf: Random variable[1] is nan, but must not be nan!”
However, if I move the declaration to the transformed parameters blocks, then it runs fine, although very slow.
Can you advise where it went wrong?
Also beside the error message, I feel that there are many places that the code can be improved, can you also advise on that?
Thanks a lot!
data {
int<lower=0> nSamples; //number of samples
int<lower=0> nBio; //number of y
int<lower=0> nX; // number of x
matrix[nSamples, nBio] y; // the multivariate outcome matrix
matrix[nSamples, nX] X; // predictor matrix
}
parameters {
matrix[nX, nBio] beta; // betas from N(0,1)
vector<lower=0>[nBio] sigma_eps; // sd of the residual
vector<lower=0>[nBio] tau; // prior scale
cholesky_factor_corr[nBio] L;
}
transformed parameters {
}
model {
matrix[nSamples, nBio] z;
matrix[nSamples, nBio] mu;
mu = X * beta + z * (diag_pre_multiply(tau,L))';
sigma_eps ~ exponential(1);
tau ~ cauchy(0, 2.5);
L ~ lkj_corr_cholesky(1);
for(i in 1:nX){
to_vector(beta[i]) ~ student_t(10, 0, 10);
}
for(i in 1:nSamples){
to_vector(z[i]) ~ std_normal();
}
for(i in 1:nSamples){
for(j in 1:(nBio)){
y[i][j] ~ normal(mu[i][j], sigma_eps[j]);
}
}
}
I see. I didn’t specifically test if Z increase my memory, but keeping mu definitely increased my fitted object size. And they significantly slowed down the program.
I worked on some other project with many Ys. In that case, keeping all the intermediate values gave me hundred of Gb of fitted object. Getting rid of those slim the object to a couple of Mb
I hope it’s ok to revive a thread like this but I have a very similar query.
In my case my model block looks likes this:
model {
// latent gp
vector[n_obs] h;
// Priors
sigma ~ normal(0, 1);
//sigma ~ inv_gamma(1, 1);
tau ~ cauchy(0, 0.1);
beta ~ std_normal();
r ~ cauchy(0, 1);
// Likelihood
h ~ multi_normal_cholesky(rep_vector(0, n_obs), cholesky_decompose(cov_Kzr(Z, r, tau, 1e-6)));
y ~ normal(h + X*beta, sigma);
}
If I run this I get the error: Exception: multi_normal_cholesky_lpdf: Random variable[1] is nan, but must not be nan!
However if I simply cut and paste the line vector[n_obs] h; to the parameters block, it runs without a problem, but I do not understand why. Scoping doesn’t seems to explain it, so what else is at play here? (h is not referred to anywhere else in the code)
Forgive me but what is a “true parameter proper” and whats the alternative to a true parameter proper?
If you declare h in the model block, you haven’t assigned any values to h. Until you do, h consists of nothing but nan. If you declare h as a parameter, then a whole bunch of things happen. Conceptually, the most important but also the most abstract is that you parameterize the posterior distribution in terms of h. This affects the posterior geometry, the gradients… everything that is important to Stan.
In practical terms, the model initializes the parameters somewhere, populating h declared as a parameter with a set of candidate values from which the HMC algorithm proceeds, simulating the evolution of h along the Hamiltonian trajectory and then updating the value of h via multinomial sampling along the trajectory.
He meant that you are writing down a model that you intend to be parameterized by h and not a model wherein you can recover h deterministically from the values of the data and the “true parameters”.
Note that ~ in Stan does not mean “draw a random number from this distribution”. It means “increment the target by a log probability density function corresponding to this distribution evaluated at the current values of the parameters and data”.
Ah ok I think I get it. I was comparing to the latent GP example in the Stan manual 10.3 Fitting a Gaussian process | Stan User’s Guide - but I see the difference is that in that model, even though f is declared in the model block, f depends on data and parameters declared in the parameter block, so the situation is different. Thanks.