Custom mixed normal PDF/likelihood

Hi, I am attempting to implement a custom error model based on the model from
Rüger, N., Berger, U., Hubbell, S. P., Vieilledent, G., & Condit, R. (2011). Growth strategies of tropical tree species: disentangling light and size effects. PloS one, 6(9), e25330.

The underlying distribution is
y_i \sim \mathcal{N} (\hat{y}_i, \sigma_i),
where \hat{y} is the model’s predicted size,
\sigma_i = \sigma_{1a} + \sigma_{1b}\hat{y}_i + I_i\sigma_2
is a combination of size-dependent and size-independent error, and I_i\sim Bern(p_e) is the indicator function of seeing a ‘large’ error.

The (hopefully minimal) stan code I have built is

  // Custom likelihood
  real ruger_error_lpdf(real y, 
                        real y_hat, 
                        real sigma_a, 
                        real sigma_b, 
                        real sigma_large, 
                        real p_large){
    real sigma_small;
    real lpdf;
    sigma_small = sigma_a + y_hat * sigma_b;
    lpdf = normal_lpdf(y | y_hat, sigma_small) * (1.0 - p_large) + 
              normal_lpdf(y | y_hat, sigma_large) * p_large;
    return lpdf;

// Data structure.
data {
  int N;
  real y_0_obs;
  real y[N];
  int index[N];

parameters {
  //Global level
  real<lower=0> global_sigma_a;
  real<lower=0> global_sigma_b;
  real<lower=0> global_sigma_large;
  real<lower=0, upper=1> global_p_large;
  real y_0;
  real beta_1;

// Modeling a single variable that is changing over time with constant rate.
model {
  real y_hat[N];
  for(i in 1:N-1){
    if(index[i]==1){//Fits the first size
      y_hat[i] = y_0;
    y_hat[i+1] = y_hat[i] + beta_1;
  // Likelihood
  y ~ ruger_error(y_hat, 

  y_0 ~ ruger_error(y_0_obs, 
  beta_1 ~lognormal(0, 1);
  global_sigma_a ~lognormal(exp(0.0927), 0.03);
  global_sigma_b ~lognormal(exp(0.00038), 0.03);
  global_sigma_large ~lognormal(exp(2.56), 2.5);
  global__p_large ~beta(1/2, 1/2);

I can generate data and apply a using the following R code:


#Generate data and apply Ruger error model
beta_1_true <- 1
y_0_true <- 10
y_true <- seq(from=y_0_true, to=25, by=beta_1_true)

#Addint error
y <- y_true + rnorm(length(y_true), mean=0, sd=0.927) + 
              rnorm(length(y_true), mean=0, sd=0.0038) * y_true + 
              rbinom(length(y_true), size=1, prob=0.0276) * rnorm(length(y_true), mean=0, sd=2.56)

rstan_data <- list(
  N = length(y),
  y_0_obs = y[1],
  y = y,
  index = c(1:length(y))

#Fit model
model <- stan_model(file="MinExample.stan")
fit <- sampling(model, data=rstan_data, 

I am getting the following error when I attempt to use this:

No matches for: 

  real[ ] ~ ruger_error(real[ ], real, real, real, real)

Available argument signatures for ruger_error:

  real ~ ruger_error(real, real, real, real, real)

Real return type required for probability function.
 error in 'model12155b8b7879_MinExample' at line 56, column 34
    54:                   global_sigma_b,
    55:                   global_sigma_large,
    56:                   global_p_large);

Error in stanc(file = file, model_code = model_code, model_name = model_name,  : 
  failed to parse Stan model 'MinExample' due to the above error.

and where implemented in a stan file like 56 indicates an error.

I have attempted to match the custom pdf format from 20.5 User-defined probability functions | Stan User’s Guide,
and it is not clear to me how to have a single real value as the first argument rather than a vector for likelihood, when other likelihoods are implemented by passing the argument for a vector-valued data type.

there is one thing which hints to the reason for the error. You defined your probability function as:

real ruger_error_lpdf(real y, 
                        real y_hat, 
                        real sigma_a, 
                        real sigma_b, 
                        real sigma_large, 
                        real p_large)

As explained in the error you try to use real[ ] which is an array. However, your function can only be applied to single real values. One solution could be to either run it in a loop to single observations or to vectorise the function.
I am also not exactly sure what you do with estimating y_0 as “priors”. Another thing I am not 100% about and maybe that is just me being wrong or fix into my habits, but I thought the priors (beta_1, global_sigma_a, etc.) should be at the top of the model block. I thought the order was important and could lead to something else when the order is changed e.g. having it after the likelihood. But maybe someone can clarify that for me.