I’m working on a model for sometimes censored data from sometimes truncated distributions. There is a lot of data and because it is collected discretized, it can be easily summarized as a value plus the number of times that value occurs. Even summarized, like this there is a lot of data so I wanted to apply vectorization. The below examples are not the full model, just a description of the kinds of things that can and can’t be done.
Unfortunately, a program like this is incorrect due to normal_lccdf returning the sum of the values rather than the vector of all of the values:
data {
int N;
vector[N] y;
vecot[N] cnt;
}
parameters{
real mu;
real sigma;
}
model {
target += cnt * normal_lccdf(y | mu, sigma);
}
When I correct the model with explicit for loops like this, it runs really slowly:
model {
for (n in 1:N)
target += cnt[n] * normal_lccdf(y[n] | mu, sigma);
}
Could I write a stan function that returns the vector of values in stead of the sum? I’ve seen a few examples of functions that return vectors, but those examples are typically built up with a for loop (which I assume wouldn’t help to vectorize the differentiation):
function{
real my_function(y vector, mu real, sigma real){
vector[nrows(y)] output;
for (n in 1: nrows(y))
output[n] = cnt[n] * normal_lcdf(y[n] | mu, sigma)
return(sum(output))
}
}
I assume that, if this is possible, I’ll need to implement the underlying algebra directly in vectorized form. Any advice for doing this in a numerically robust way for the normal distribution?