# How to use pseudotype for vectorization

1. For speed up, vectorization of for loops (I am aware vectorization may provide limited performance boost Vectorising user-defined multivariate lpdf/lpmf functions - #10 by fergusjchadwick )
2. To implement vectorization, replace singe real with a pseudotype
3. Pseudotype reals seems to be the correct one. Based on the description of Stan Function Reference 10.8 Vectorization: “Argument positions declared as `reals` may be filled with a real, a one-dimensional array, a vector, or a row-vector.
But it hasn’t be implemented in rstan yet? Got error message (Non-void) type expected in function argument declaration if reals used. So real (square brackets) used instead.
4. real works fine with an array of real
5. real not compatible with a single real

Testing by re-creating a simple normal probability density function and generating pointwise likelihoods.

Step 4 implemented as the code below:

``````functions{
real myNormal_lpdf(real[] y, real mu, real sigma){
real LL[dims(y)[1]];
for(i in 1:dims(y)[1]){
LL[i] = -log(sigma) - square(mu-y[i]) / (2 * sigma^2);
}
return sum(LL);
}
}

data{
int N;
real y[N];

// hyperparameters
real prior_mu[2];
real prior_sigma[2];
}

parameters{
real mu;
real<lower=0> sigma;
}

model{
// priors
mu ~ normal(prior_mu[1],prior_mu[2]);
sigma ~ cauchy(prior_sigma[1],prior_sigma[2]);

// likelihood
// for(n in 1:N){                                                            // loop
//   target += myNormal_lpdf(y[n] | mu,sigma);
// }
target += myNormal_lpdf(y | mu,sigma);                   // vectorization
}

``````

The sampling statement works just fine because y was specified as an array of reals real which is the same as defined in the function.

However, the input of a single real is not compatible with real. Problem in step 5 by adding a Generated Quantities block:

``````functions{
real myNormal_lpdf(real[] y, real mu, real sigma){
real LL[dims(y)[1]];
for(i in 1:dims(y)[1]){
LL[i] = -log(sigma) - square(mu-y[i]) / (2 * sigma^2);
}
return sum(LL);
}
}

data{
int N;
real y[N];

// hyperparameters
real prior_mu[2];
real prior_sigma[2];
}

parameters{
real mu;
real<lower=0> sigma;
}

model{
// priors
mu ~ normal(prior_mu[1],prior_mu[2]);
sigma ~ cauchy(prior_sigma[1],prior_sigma[2]);

// likelihood
// for(n in 1:N){                                                            // loop
//   target += myNormal_lpdf(y[n] | mu,sigma);
// }
target += myNormal_lpdf(y | mu,sigma);                   // vectorization
}

generated quantities{
real log_lik[N];
for(n in 1:N){
log_lik[n] = myNormal_lpdf(y[n] | mu,sigma);
}
}

``````

Got an error message “Ill-typed arguments supplied to function ‘myNormal_lpdf’. Available signatures:
(real[], real, real) => real
Instead supplied arguments of incompatible type: real, real, real.
I suppose real should be able to accept a one-dimensional array of reals with any number of elements including one of course. Why it isn’t working?
Am I using the pseudotype real correctly?
What’s the best way to use a pseudotype?

Thank you very much!

The pseudotype `reals` is just a placeholder in the manual so that we don’t have to list all of the `real`, `real[]`, `vector`, etc. combinations. It’s not a type that can be used in Stan.

There are two approaches you can apply if you want your lpdf to work with both scalars and vectors/arrays/row-vectors.

You could take advantage of the new function overload support introduced in 2.29, which lets you define both a scalar and vectorised version of the function:

``````  real myNormal_lpdf(real y, real mu, real sigma){
return -log(sigma) - square(mu-y) / (2 * sigma^2);
}

real myNormal_lpdf(real[] y, real mu, real sigma){
real LL[dims(y)[1]];
for(i in 1:dims(y)[1]){
LL[i] = -log(sigma) - square(mu-y[i]) / (2 * sigma^2);
}
return sum(LL);
}
``````

Now when you call `myNormal_lpdf`, Stan will automatically select the right definition depending on whether `y` is a `real` or a `real[]`.

For this approach you would need to change your code to use `cmdstanr` since `rstan` isn’t quite up to date.

### Vectorised by Default

Alternatively, you can take advantage of Stan’s vectorised functions and have a single definition for vectorised inputs, and then wrap the input variables in arrays when you pass them:

``````  real myNormal_lpdf(real[] y, real mu, real sigma){
return sum(-log(sigma) - square(mu-to_vector(y)) ./ (2 * sigma^2));
}
``````

Then you can use `myNormal_lpdf` in two ways. First, with `y` as the full array of `real`:

``````  y ~ myNormal(mu, sigma);
``````

Or by passing each element as an array of `real`:

``````for(i in 1:N) {
{y[i]} ~ myNormal(mu, sigma);
}
``````

Where the `{}` are a shorthand for `to_array1d()`. You can also use this approach to allow for `mu` and `sigma` to be agnostic between `real` and `real[]` as well, allowing for additional flexibility. You can also use this approach with `rstan`.

However, this will generally be somewhat less efficient than the function overloading approach, so if you have a large/complex model then this may not be the best approach.

Hope that helps!

2 Likes

Thank you very much Andrew!

I have some follow-up questions about the second approach Vectorised by Default.

I mistakenly thought a real is a one-dimensional array of real with only one element. But it did not work out as I expected. I followed your suggestion to array-lize element y[n]. Now the lpdf function works for both real and real types of y. This is amazing.

I also explore if I could define y as a vector in the lpdf function as

``````functions{
real myNormal_lpdf(vector y, real mu, real sigma){
return sum(-log(sigma) - square(mu-to_vector(y)) ./ (2 * sigma^2));
}
}

``````

Next, If to pass one element, i.e. single real, into the lpdf function, the element must be converted to real array then a vector, because to_vector function only accepts arraies of real/int, vectors and matrix.

``````  for(n in 1:N){
log_lik[n] = myNormal_lpdf(to_vector({y[n]}) | mu,sigma);
}
``````

This seems a bit clunky to me. I am not informed enough to see whether this vector specification is better than the real specification. And I haven’t observed additional performance enhancement from the vector specification.
Am I missing anything? What’s your opinion?

There’s no additional benefit here for defining `myNormal` with the vector input - because you then have to the real → array → vector conversion. Is there any reason why you need that? If not I’d recommend staying with the `real[]` specification if it’s working for you

Got it thank you! I think I will stay with the real.

I was exploring the vector specifications for personal reasons. I inherited some functions defined by colleagues using vector input instead of real. I want to modify generated quantities for these functions using loops.

1 Like