Question about vectorized truncated likelihood

Dear Stan forum members,

I’m quite new to Stan and Bayesian statistics in general.

For practice I am trying to make a simple intercept-only model for my data which looks like this :

model{
  ...
  for (i in 1:N) {
    mu[i] = a + a_an[idx_an[i]] + a_st[idx_st[i]] + a_gr[idx_gr[i]];
  }
   Area_s ~ normal(mu, sigma);
}

where N is the number of data points and a_xx are intercepts for random effects and the Area_s the outcome variable.

However, I want to use a truncated normal likelihood because the data can not be lower than 0. Now if I code the model like this:


model{
  ...
  for (i in 1:N) {
    mu[i] = a + a_an[idx_an[i]] + a_st[idx_st[i]] + a_gr[idx_gr[i]];
  }
   // vectorized
   Area_s ~ normal(mu, sigma) T[0,]; // truncated normal
}

the compiler throws an error.

Outcomes in truncated distributions must be univariate.
Found outcome expression: Area_s
with non-univariate type: real[]

However when I place the likelihood in a for loop (unvectorized), the code compiles fine and the model samples well:

model{
  ...
for (i in 1:N) {
    mu[i] = a + a_an[idx_an[i]] + a_st[idx_st[i]] + a_gr[idx_gr[i]];
  }
  for (i in 1:N) {
   // unvectorized
    Area_s[i] ~ normal(mu[i], sigma) T[0,];  // truncated normal
  }
}

In the Stan manual I found that the vectorized version of the likelihood is more efficient. Now I want to know if there is a vectorized version for a truncated likelihood available.

Hello!

Did you declare Area_s as an array in the data block? Something like this:

data{
...
real Area_s[N];
...
}

If you did, you should try to declare it as a vector, using

vector[N] Area_s

Thank you for your reply! You were right that I declared Area_s as
real Area_s[N];

However, when I declared the variable as a vector like you suggested, the compiler still throws an error:

Outcomes in truncated distributions must be univariate.
Found outcome expression: Area_s
with non-univariate type: vector

Just in case I’ll give the entire model code

data{
  int<lower=1> N;
  int<lower=1> N_an;
  int<lower=1> N_st;
  int<lower=1> N_gr;
  int idx_an[N];
  int idx_st[N];
  int idx_gr[N];
  vector[N] Area_s;
}
parameters{
  real a;
  vector[N_an] a_an;
  vector[N_st] a_st;
  vector[N_gr] a_gr;
  real<lower=0> sigma;
}
model{
  vector[N] mu;
  sigma ~ exponential( 1 );
  a_an ~ normal( 0 , 1 );
  a_st ~ normal( 0 , 1 );
  a_gr ~ normal( 0 , 1 );
  a ~ normal( 1 , 1 );
  for (i in 1:N) {
    mu[i] = a + a_an[idx_an[i]] + a_st[idx_st[i]] + a_gr[idx_gr[i]];
  }
    Area_s ~ normal(mu, sigma) T[0,]; //this line throws the error
}

Hum quickly, I do not see anything problematic in your code… I used this formulation for the truncated normal yesterday, but I did not tried to vectorized it and used a loop for other purposes…

We will have to wait for more pertinent people to answer!

At the page 595 of the manual (2.17.0), we can read :

Most distributions have been vectorized, but currently the truncated versions may not exist and may not be vectorized.

I think you will have to accept the loop :) Fortunatly, using loops in stan, which depends upon a low level progamming language, is far less painfull than using loops in R!

1 Like

Oh well :) … In any case I really appreciate your replies. Thanks.