Custom mixed normal PDF/likelihood

tesslacoil · September 4, 2023, 3:28am

Hi, I am attempting to implement a custom error model based on the model from
Rüger, N., Berger, U., Hubbell, S. P., Vieilledent, G., & Condit, R. (2011). Growth strategies of tropical tree species: disentangling light and size effects. PloS one, 6(9), e25330.

The underlying distribution is
y_i \sim \mathcal{N} (\hat{y}_i, \sigma_i),
where \hat{y} is the model’s predicted size,
\sigma_i = \sigma_{1a} + \sigma_{1b}\hat{y}_i + I_i\sigma_2
is a combination of size-dependent and size-independent error, and I_i\sim Bern(p_e) is the indicator function of seeing a ‘large’ error.

The (hopefully minimal) stan code I have built is

functions{
  // Custom likelihood
  real ruger_error_lpdf(real y, 
                        real y_hat, 
                        real sigma_a, 
                        real sigma_b, 
                        real sigma_large, 
                        real p_large){
    real sigma_small;
    real lpdf;
    
    sigma_small = sigma_a + y_hat * sigma_b;
    lpdf = normal_lpdf(y | y_hat, sigma_small) * (1.0 - p_large) + 
              normal_lpdf(y | y_hat, sigma_large) * p_large;
              
    return lpdf;
  }
}

// Data structure.
data {
  int N;
  real y_0_obs;
  real y[N];
  int index[N];
}

parameters {
  //Global level
  real<lower=0> global_sigma_a;
  real<lower=0> global_sigma_b;
  real<lower=0> global_sigma_large;
  real<lower=0, upper=1> global_p_large;
  
  real y_0;
  real beta_1;
}

// Modeling a single variable that is changing over time with constant rate.
model {
  real y_hat[N];
  
  for(i in 1:N-1){
    if(index[i]==1){//Fits the first size
      y_hat[i] = y_0;
    }
    
    y_hat[i+1] = y_hat[i] + beta_1;
  }
  
  // Likelihood
  y ~ ruger_error(y_hat, 
                  global_sigma_a, 
                  global_sigma_b,
                  global_sigma_large,
                  global_p_large);

  //Priors
  y_0 ~ ruger_error(y_0_obs, 
                    global_sigma_a, 
                    global_sigma_b,
                    global_sigma_large,
                    global_p_large);
                    
  beta_1 ~lognormal(0, 1);
                    
  global_sigma_a ~lognormal(exp(0.0927), 0.03);
  global_sigma_b ~lognormal(exp(0.00038), 0.03);
  global_sigma_large ~lognormal(exp(2.56), 2.5);
  global__p_large ~beta(1/2, 1/2);
}

I can generate data and apply a using the following R code:

library(rstan)

#Generate data and apply Ruger error model
beta_1_true <- 1
y_0_true <- 10
y_true <- seq(from=y_0_true, to=25, by=beta_1_true)

#Addint error
y <- y_true + rnorm(length(y_true), mean=0, sd=0.927) + 
              rnorm(length(y_true), mean=0, sd=0.0038) * y_true + 
              rbinom(length(y_true), size=1, prob=0.0276) * rnorm(length(y_true), mean=0, sd=2.56)

rstan_data <- list(
  N = length(y),
  y_0_obs = y[1],
  y = y,
  index = c(1:length(y))
)

#Fit model
model <- stan_model(file="MinExample.stan")
fit <- sampling(model, data=rstan_data, 
                  iter=100,
                  chains=2,
                  cores=2)

I am getting the following error when I attempt to use this:

SYNTAX ERROR, MESSAGE(S) FROM PARSER:
No matches for: 

  real[ ] ~ ruger_error(real[ ], real, real, real, real)

Available argument signatures for ruger_error:

  real ~ ruger_error(real, real, real, real, real)

Real return type required for probability function.
 error in 'model12155b8b7879_MinExample' at line 56, column 34
  -------------------------------------------------
    54:                   global_sigma_b,
    55:                   global_sigma_large,
    56:                   global_p_large);
                                         ^
    57: 
  -------------------------------------------------

Error in stanc(file = file, model_code = model_code, model_name = model_name,  : 
  failed to parse Stan model 'MinExample' due to the above error.

and where implemented in a stan file like 56 indicates an error.

I have attempted to match the custom pdf format from 20.5 User-defined probability functions | Stan User’s Guide,
and it is not clear to me how to have a single real value as the first argument rather than a vector for likelihood, when other likelihoods are implemented by passing the argument for a vector-valued data type.

danielparthier · September 4, 2023, 7:31am

Hello,
there is one thing which hints to the reason for the error. You defined your probability function as:

real ruger_error_lpdf(real y, 
                        real y_hat, 
                        real sigma_a, 
                        real sigma_b, 
                        real sigma_large, 
                        real p_large)

As explained in the error you try to use real[ ] which is an array. However, your function can only be applied to single real values. One solution could be to either run it in a loop to single observations or to vectorise the function.
I am also not exactly sure what you do with estimating y_0 as “priors”. Another thing I am not 100% about and maybe that is just me being wrong or fix into my habits, but I thought the priors (beta_1, global_sigma_a, etc.) should be at the top of the model block. I thought the order was important and could lead to something else when the order is changed e.g. having it after the likelihood. But maybe someone can clarify that for me.

Topic		Replies	Views
Function bodies error with custom likelihood Modeling specification	2	639	June 8, 2022
Errors while trying to write joint likelihood of a normal_lpdf and a bernoulli_lpmf in Stan Modeling	6	658	March 19, 2018
Custom Likelihood function with hierarchical model Modeling	1	498	July 12, 2019
I am trying draw samples from a userdefined pdf with HMC algorithm, but there is something wrong in my code Modeling fitting-issues	2	318	January 6, 2021
Using a custom likelihood function in pystan Modeling fitting-issues	6	1547	January 24, 2020

Custom mixed normal PDF/likelihood

Related topics