# Custom mixed normal PDF/likelihood

Hi, I am attempting to implement a custom error model based on the model from
Rüger, N., Berger, U., Hubbell, S. P., Vieilledent, G., & Condit, R. (2011). Growth strategies of tropical tree species: disentangling light and size effects. PloS one, 6(9), e25330.

The underlying distribution is
y_i \sim \mathcal{N} (\hat{y}_i, \sigma_i),
where \hat{y} is the model’s predicted size,
\sigma_i = \sigma_{1a} + \sigma_{1b}\hat{y}_i + I_i\sigma_2
is a combination of size-dependent and size-independent error, and I_i\sim Bern(p_e) is the indicator function of seeing a ‘large’ error.

The (hopefully minimal) stan code I have built is

functions{
// Custom likelihood
real ruger_error_lpdf(real y,
real y_hat,
real sigma_a,
real sigma_b,
real sigma_large,
real p_large){
real sigma_small;
real lpdf;

sigma_small = sigma_a + y_hat * sigma_b;
lpdf = normal_lpdf(y | y_hat, sigma_small) * (1.0 - p_large) +
normal_lpdf(y | y_hat, sigma_large) * p_large;

return lpdf;
}
}

// Data structure.
data {
int N;
real y_0_obs;
real y[N];
int index[N];
}

parameters {
//Global level
real<lower=0> global_sigma_a;
real<lower=0> global_sigma_b;
real<lower=0> global_sigma_large;
real<lower=0, upper=1> global_p_large;

real y_0;
real beta_1;
}

// Modeling a single variable that is changing over time with constant rate.
model {
real y_hat[N];

for(i in 1:N-1){
if(index[i]==1){//Fits the first size
y_hat[i] = y_0;
}

y_hat[i+1] = y_hat[i] + beta_1;
}

// Likelihood
y ~ ruger_error(y_hat,
global_sigma_a,
global_sigma_b,
global_sigma_large,
global_p_large);

//Priors
y_0 ~ ruger_error(y_0_obs,
global_sigma_a,
global_sigma_b,
global_sigma_large,
global_p_large);

beta_1 ~lognormal(0, 1);

global_sigma_a ~lognormal(exp(0.0927), 0.03);
global_sigma_b ~lognormal(exp(0.00038), 0.03);
global_sigma_large ~lognormal(exp(2.56), 2.5);
global__p_large ~beta(1/2, 1/2);
}



I can generate data and apply a using the following R code:

library(rstan)

#Generate data and apply Ruger error model
beta_1_true <- 1
y_0_true <- 10
y_true <- seq(from=y_0_true, to=25, by=beta_1_true)

y <- y_true + rnorm(length(y_true), mean=0, sd=0.927) +
rnorm(length(y_true), mean=0, sd=0.0038) * y_true +
rbinom(length(y_true), size=1, prob=0.0276) * rnorm(length(y_true), mean=0, sd=2.56)

rstan_data <- list(
N = length(y),
y_0_obs = y[1],
y = y,
index = c(1:length(y))
)

#Fit model
model <- stan_model(file="MinExample.stan")
fit <- sampling(model, data=rstan_data,
iter=100,
chains=2,
cores=2)


I am getting the following error when I attempt to use this:

SYNTAX ERROR, MESSAGE(S) FROM PARSER:
No matches for:

real[ ] ~ ruger_error(real[ ], real, real, real, real)

Available argument signatures for ruger_error:

real ~ ruger_error(real, real, real, real, real)

Real return type required for probability function.
error in 'model12155b8b7879_MinExample' at line 56, column 34
-------------------------------------------------
54:                   global_sigma_b,
55:                   global_sigma_large,
56:                   global_p_large);
^
57:
-------------------------------------------------

Error in stanc(file = file, model_code = model_code, model_name = model_name,  :
failed to parse Stan model 'MinExample' due to the above error.


and where implemented in a stan file like 56 indicates an error.

I have attempted to match the custom pdf format from 20.5 User-defined probability functions | Stan User’s Guide,
and it is not clear to me how to have a single real value as the first argument rather than a vector for likelihood, when other likelihoods are implemented by passing the argument for a vector-valued data type.

Hello,
there is one thing which hints to the reason for the error. You defined your probability function as:

real ruger_error_lpdf(real y,
real y_hat,
real sigma_a,
real sigma_b,
real sigma_large,
real p_large)


As explained in the error you try to use real[ ] which is an array. However, your function can only be applied to single real values. One solution could be to either run it in a loop to single observations or to vectorise the function.
I am also not exactly sure what you do with estimating y_0 as “priors”. Another thing I am not 100% about and maybe that is just me being wrong or fix into my habits, but I thought the priors (beta_1, global_sigma_a, etc.) should be at the top of the model block. I thought the order was important and could lead to something else when the order is changed e.g. having it after the likelihood. But maybe someone can clarify that for me.