# Sampling error: Unrecoverable error evaluating the log probability at the initial value

#1

I am trying to use Stan to obtain the high-dimensional posterior distribution of prior x likelihood = (Poisson x Normal distribution) using marginalization. The result should be the posterior distributions of each component of a vector with size = nrow below (nrow is very large, like 1000). But I keep encountering this error message:

SAMPLING FOR MODEL ‘stanmodel2’ NOW (CHAIN 1).
Unrecoverable error evaluating the log probability at the initial value.
Exception: []: accessing element out of range. index 0 out of range; expecting index to be between 1 and 875; index position = 1log_weights (in ‘model1cc1651dcead_stanmodel2’ at line 35)

My Stan model is here:

data {
int<lower=1> ncol;
int<lower=1> nrow;
vector[ncol] yH;
vector[nrow] x;
#int x[nrow];
matrix[nrow, ncol] A;
vector[nrow] sigma_x;
vector[ncol] sigma_y;
vector[nrow] epsilon;
int big_integer;
}

parameters {
real log_lambda;
}

model {
for (j in 0:ncol) {
vector[big_integer] log_weights;
vector[big_integer] log_dens;
for (i in 0:big_integer) {
log_weights[i] = poisson_log_lpmf(i | log_lambda);
log_dens[i] = normal_lpdf(x[i] | i, sigma_x[i]);
}
target += log_sum_exp(log_weights + log_dens);
}


R code to feed in the data and call sampling():

for(i in 1:15){
nrow = nrow(rte_m[[i]]);
ncol = ncol(rte_m[[i]]);
A <- as.matrix(rte_m[[i]]);
sigma_x <- as.vector(sample.int(10, nrow(kf_vect[[i]]), replace=TRUE))
sigma_y <- as.vector(eps_vect[[i]])
# sigma_x <- diag(1, nrow)
# sigma_y <- diag(1, nrow)
yH <- as.vector(dh_vect[[i]]\$X);
yT <- as.array(rpois(ncol,round(yH + as.vector(eps_vect[[i]]))));
epsilon <- sample.int(15, nrow(kf_vect[[i]]), replace=TRUE);
x <- rnorm(nrow, as.vector(as.matrix(rte_m[[i]])%*%yT) + epsilon, sigma_x);
iterations = 100;

#input it into our Stan model file "stamodeling.stan"
stanmodel1 <- stan_model(file = "poisson.stan",
model_name = "stanmodel2");

#NUTS (No U-Turn) sampler to generate posterior distribution
stanfit <- sampling(stanmodel1, cores = parallel::detectCores(), data = list(ncol = ncol,nrow = nrow,
yH = yH,
x=x, epsilon = epsilon,
A = A, sigma_x = sigma_x, sigma_y = sigma_y, big_integer = nrow) ,iter=iterations, chains = 3, control = list(max_treedepth=13));


My question: Can anyone please give me some help on what went wrong with my Stan model? I checked all the dimensions of the poisson_log_lpmf(i | log_lambda) and normal_lpdf(x[i] | i, sigma_x[i]) to make sure they are correct (and they are!), so I could not see why MCMC keeps failing at the initial state. The matrix A is very sparse one (like 90% of the entries are 0), but that should not affect the initial state, isn’t it?

Discrete parameter in Stan
Discrete parameter in Stan
#2

Hello!

Vectors/Matrices/Arrays are indexed from 1 in Stan, not 0. So inside the loop(s) in your model block, you are asking for log_weights[0] and log_dens[0], which are not valid array indices in Stan. Hence you get the index 0 out of range exception.

Modifying the for loops to begin from 1 should alleviate that error. (Also, you seem to be looping from 1 to ncol, using j, without using j anywhere? Are you sure you need to loop over 1:ncol?)

#3

@hhau: Many thanks for your great help( I should have figured it out myself). The reason I looped over 1:ncol is because if I don’t, then each component of one vector of size big_integer x 1 would only be sampled only once (correct?), so we don’t really sample anything. But I guess i am wrong, because I do 100 iterations in sampling() function and 3 chains, so 300 times for each component in total?

I have another question though: since i never tried target += before, assuming I finish sampling, what does target()() extract for us in this case? My guess is the sampling log values of the (Poisson x Normal) above, so should I take the exp() afterwards? Of course, I would get the bunch of values of the parameter log_lambda, but I don’t think it gives me anything related to the value of the posterior though.

MCMC sampling with high-dimensional data
#4

log_lambda is the only parameter declared in the parameters block of the above stan model, and hence it is the only quantity in this model that has a posterior distribution. Specifically log_lambda is declared to be a singular real. The original post indicates that you anticipate log_lambda to be a vector of size nrow, but the type and size declarations would indicate otherwise!

If you are interested in specific value of the log-probability, then rstan::extract(stanfit, "lp__") might be what you are after?

I’m not too familiar with the target += syntax (I tend to avoid it where possible, although it is unavoidable in mixture models), but my understanding is that is an expression that allows for the direct incrementation of the unnormalised, log-probability of your model (what you refer to as your log-likelihood in your original post), I would take careful note of Section 5.2 of the Stan manual for the specifics of target +=.

On the topic of the manual, Section 13.5 implies there is currently no way to vectorise mixture models at the observational level, so the fact that you have target += log_sum_exp(log_weights + log_dens); outside of the loop over 1:big_integer is perhaps concerning?

#5

@hhau: I deleted the loop 1:big_integer. Yes, I think I am more interested in the log-probability in this case. What still puzzles me is whether the distribution of the log-probability is the same as that of the posterior distribution of Poisson x Normal (without having to take an exponential). From what I understand based on the Manual, it said they are the same!

#6

Can I ask why you are interested in the value of the log-probability? Typically when one is modelling with mixture models, three major posterior quantities (or subset / transformation thereof) are of interest:

• The posterior distribution of the parameters, including the mixture weights, i.e where are my components located, and what proportion of them do I expect in my data-generating process
• The posterior probabilities of mixture component membership (What the is the posterior probability of observation Y_i belonging to mixture component k)
• The posterior predictive distribution of a new set of (as yet unobserved) covariates (What is the posterior distribution of Y_{\text{new}} for a covariate vector X_{\text{new}}).

(The above list is totally biased from my own experience / exposure)

I’m still trying to figure out what exactly this Stan model is doing (you have a lot of unused variables in your data block which are confusing me!), but I’m not sure how the specific value of the log-probability pertains to your interest in “the posterior distributions of each component”, although there is a good chance it has flown over my head! Perhaps explicitly writing out your mathematical model, and the expression the posterior is proportional to, might be illuminative for you (and definitely for someone who isn’t familiar with your particular application, like me!).

#7

@hhau: Sure, please let me clarify a bit. I am using Hierarchical Bayesian Model with the following data and variable:
Prior: y_{T} \sim Poisson(y_{H})
Likelihood: x \sim N(A\ y_{T}, \sigma_{y})

Assume x and y_{H} are conditional independent given y_{T}. I want to know P(y_{T} |x, y_{H}), which is \propto P(x|y_{T})P(y_{T}) (Bayesian’s rule here).

The data here is y_{H}, matrix A, \sigma_y , and x.

Is this much clearer now?? This simple setting, apparently, leads to a very difficult implementation in Stan, mainly because the fact that in the parameter block, we cannot declare int() type. So I can only know that marginalization is the only way to go.

#8

The other alternative would be to just use a continuous approximation, which will be quite reasonable if y^T >> 0.

y^T \sim \mathsf{Normal}(y^H, \sqrt{y^H})

x \sim \mathsf{Normal}(A \cdot y^T, \sigma^y)

It will require the constraint y^T > 0 in the parameter declaration.

I made everything a superscript since I like to reserve subscripts for indexing.

Compounding two normals just gives you another normal with wider variance (normal is its own conjugate prior), so this could all just be compounded in a single distribution with parameters given as functions of the ones you have here.