Different types of reals in documentation and user defined functions?

I was going to update my geomagnetic storm extreme value analysis example, and wanted to use new _lpdf, _lccdf syntax for my user defined functions. While looking for the information in the Stan manual 2.16.0, which is the latest linked in http://mc-stan.org/users/documentation/index.html, I got confused what is difference between
`reals y`, `real[] y`, and `real y[]` in function signatures

After some searching I found the description of “The pseudo-type reals” p. 495, but that specific section could also have an example of actual function definitions to help someone who wants to write a user defined distribution functions. I think I should have function signatures

``````functions {
real gpareto_lpdf(real y[], real k, real sigma) {
...
real gpareto_lccdf(real y[], real k, real sigma) {
...
``````

in order to be able to have code

``````  y ~ gpareto(k, sigma);
``````

and

``````  for (n in 1:N)
log_lik[n]  = gpareto_lpdf(y[n] | k, sigma);
for (n in 1:Nt)
predccdf[n] = exp(gpareto_lccdf(yt[nt] | k, sigma));
``````

Is the bar `|` notation working also for the user defined functions?

ok, I figured it should be

``````functions {
real gpareto_lpdf(real[] y, real k, real sigma) {
...
``````

or

``````functions {
real gpareto_lpdf(vector y, real k, real sigma) {
...
``````

which is accepted for

``````  y ~ gpareto(k, sigma);
``````

depending whether we have `real[] y` or `vector y`, but then in the generated quantities I need to cast real to array or vector:

``````generated quantities {
real log_lik[N];
/* vector[Nt] predccdf; */
for (n in 1:N)
log_lik[n]  = gpareto_lpdf(rep_array(y[n],1) | k, sigma);
...
``````

It seems it’s not possible to overload user-defined functions to make the rest of the code look same as with built-in distribution functions?

I’m planning to make a case study out of this, so any help and comments for how to make this clear will be appreciated. I know I could define additional user-defined function to be used in generated quantities (even one which would return an array of log_lik’s with one call), but I’m afraid that could also confuse someone.

The whole example code with user-defined generalized Pareto distribution (without argument checks) is

``````functions {
real gpareto_lpdf(real[] y, real k, real sigma) {
// generalised Pareto log pdf with mu=0
// should check and give error if k<0 and max(y)/sigma > -1/k
int N;
N = dims(y)[1];
if (fabs(k) > 1e-15)
return -(1+1/k)*sum(log1p(to_vector(y) * (k/sigma))) -N*log(sigma);
else
return -sum(y)/sigma -N*log(sigma); // limit k->0
}
real gpareto_lccdf(real[] y, real k, real sigma) {
// generalised Pareto log ccdf with mu=0
// should check and give error if k<0 and max(y)/sigma < -1/k
if (fabs(k) > 1e-15)
return (-1/k)*sum(log1p(to_vector(y) * (k/sigma)));
else
return -sum(y)./sigma; // limit k->0
}
}
data {
int<lower=0> N;
real<lower=0> y[N];
int<lower=0> Nt;
real<lower=0> yt[Nt];
}
transformed data {
real ymax;
ymax = max(y);
}
parameters {
real<lower=0> sigma;
real<lower=-sigma/ymax> k;
}
model {
y ~ gpareto(k, sigma);
}
generated quantities {
real log_lik[N];
vector[Nt] predccdf;
for (n in 1:N)
log_lik[n]  = gpareto_lpdf(rep_array(y[n],1) | k, sigma);
for (nt in 1:Nt)
predccdf[nt] = exp(gpareto_lccdf(rep_array(yt[nt],1) | k, sigma));
}
``````

Correct. Also, there is no `real y[]` signature only `real[] y`, `vector y`, and `row_vector y` in the one-dimensional case.

Thanks for verifying this. Do you (or someone else) have opinions, whether to write it as above, or is it better to have a separate functions for generated quantities block?

Then there are errors in the manual. If you can verify that all `real x[]` should be `real[] x`, I can fix the manual.

1 Like
``````functions {
real foo(real x[]) {
return sum(x);
}
}
``````

yields a parser error, so the manual must be wrong.

I personally have pretty much stopped using one-dimensional real arrays in favor of using vectors. I used to use them when there was no linear algebra involving them, but it just got too confusing for students when to use one and when to use the other.

I completely agree about confusion. There’s a problem that some functions accept only real[], like cov_exp_quad.

1 Like

Yeah, `cov_exp_quad` should accept the metatype for all one-dimensional containers of reals.

The errors are in Section 41.1 and Index, but I think it’s possible that instead of `real[] x`, these might be also `reals`, but Ican’t be sure and I didn’t fix these, but added a comment to the next manual issue.

1 Like

That’s right.

• `reals y` is shorthand for any of `real` (scalar), `real[]` (1D array of scalars), `vector` (column vector), and `row_vector` (row vector),

• `real[] y` is the function-argument notation that packs a type without sizes together into a single string, and

• `real y[N] ...` is for declaring the variable `y`.

For use, yes, but the definitions don’t use the bar. We could change that, but it looks odd with the types.

I’m a computer scientist, so I always like things encapsulated in small functions. Here, I think it would make sense to have something like this:

``````real my_normal_lpdf(real y, real mu, real sigma) {
... base definition ...
}

real[] my_normal_vec_lpdf(real[] y, real mu[], real sigma) {
real lpd[size(y)];
for (n in 1:size(y)) lpd[n] = my_normal_lpdf(y[n], mu[n], sigma);
return lpd;
}
``````

Or you could get fancy and code the vectorized version more efficiently by reusing shared expressions.

Then you can call `my_normal_vec_lpdf` in both places, and just do

``````target += my_normal_vec_lpdf( ... );
``````

because `+=` should take containers on the right-hand side and just add them all to the target log density.

`-sum(y) ./ sigma` should be just `-sum(y) / sigma`.

`dims(y)[1]` can be just `size(y)`

`(-1/k)*sum(...)` should probably be `-sum(...) / k`.

1 Like

Me, too. Yes, if the manual has that, it’s wrong. I want to actually move to using

``````real[5] x;
``````

for array declarations. The only confusing thing is

``````matrix[4, 5][3] x;
``````

is then going to be a 3-element array of 4x5 matrices.

I don’t really mind having the array size after `x`; I just don’t use `real x[K]` much anymore.

``````y ~  my_normal_vec( ... );
What bugs me is the inconsistency in function arguments. Those don’t have the problem of having vector or matrix sizes. So it’s just `matrix x` or `matrix[] xs` for an array of matrices.
No, that won’t work, I’m afraid. The return type has to be `real`.