Custom Distribution compiling error message "No Matches for..."

Hi I am trying to create my own custom distribution function in PyStan (version = 2.19.1.1). This is just a toy example for me to learn how to make custom distribution functions, so I didn’t put too much thought into choice of prior etc. I feel like it should be simple, but I keep getting an error when I try to compile. The error I see is as follows:

ValueError: Failed to parse Stan model ‘anon_model_bf95be6829e9dc127f6cab4591949241’. Error message:
SYNTAX ERROR, MESSAGE(S) FROM PARSER:
*No matches for: *

  • real ~ newexp(real)*

Available argument signatures for newexp:

  • vector ~ newexp(real)*

Real return type required for probability function.

I believe I should be returning a Real at the end of my custom function… but maybe i just don’t understand what is actually happening. Can anyone help?

Code used to generate a toy dataset:

import numpy as np
import scipy.stats as stats

#Hyperparams:
lmbda = [0.01,0.1,1]
num_states = len(lmbda) #total number of states our model will have

#Parameters
toy_scale = [];
for l in lmbda:
  #draw a random value from exponential to determine states' scale parameter
  toy_scale.append(stats.expon.rvs(loc = 0, scale = 1/l)) 
toy_shape = [0.5,0.5,0.5] #all the shapes will be assumed the same (0.5)

#Generate a toy dataset and state indicators
toy_data = []; toy_data_state = []; #data and state indicator (respectively)
len_state = 100 #number of data points belonging to each state
for i in range(num_states):
  #each states parameters
  temp_shape = toy_shape[i] #shape of weibull
  temp_scale = toy_scale[i] #scale of weibull
  
  #generate a sequence of random variables drawn from the weibull
  toy_data.extend(stats.weibull_min.rvs(temp_shape,loc = 0, scale = temp_scale,size = len_state))
  
  #label the states
  toy_data_state.extend(np.repeat(i,len_state))

#Convert from list to array (to put into STAN)
toy_data = np.array(toy_data) 
toy_data_state = np.array(toy_data_state)

Code used for Pystan modeling (gives me error message)

#Pystan modeling
example_code = """
functions{
    real newexp_lpdf(vector x, real lam){
      vector[num_elements(x)] prob;
      real lprob;
      for (i in 1:num_elements(x)){
        prob[i] = lam*exp(-lam * x[i]);
      }
      lprob = sum(log(prob));
      return(lprob);
    }
}
data {
    int<lower=0> N; // number of observations
    int<lower=0> K; // number of keys
    int<lower = 0, upper = K> state_id[N]; // data's group membership based on state
    vector<lower=0>[N] y; // observed data
    vector<lower=0>[K] lam1; //known hyperparam
}
parameters {
    vector<lower=0>[K] scale;
}
model {
    for (k in 1:K){
        scale[k] ~ newexp(lam1[k]);//hyper parameter depends on the group
    }
    for (n in 1:N){
      y[n] ~ weibull(0.5, scale[state_id[n]]); //likelihood depends on the group (state_id)
    }
}
"""
example_dat = {'N': len(toy_data),
               'K': num_states,
               'state_id': toy_data_state + 1, #note: we +1 here because actually Stan indexes beginning at 1, while python indexes beginning at 0
               'y': toy_data, 
                'lam1': lmbda} 

sm = pystan.StanModel(model_code=example_code)

Additional information:

  • Operating System: Windows 10
  • Python Version: 3.7.5
  • PyStan Version: 2.19.1.1
  • Compiler/Toolkit: Databricks

Your function specifies that newexp should take a vector and a real as inputs, but you’re passing two reals. This is a little easier to see if you use the target += notation.

So your current specification:

    for (k in 1:K){
        scale[k] ~ newexp(lam1[k]);//hyper parameter depends on the group
    }

Is equivalent to:

    for (k in 1:K){
      target += newexp_lpdf(scale[k], lam1[k]);//hyper parameter depends on the group
    }

Where you can see that you’re passing two reals as inputs

Ah it is very clear that way. So the correct way for me to write my custom version of the exponential distribution is like this:

functions{
    real newexp_lpdf(real x, real lam){
      real prob;
      real lprob;
      prob = lam*exp(-lam * x);
      lprob = log(prob);
      return(lprob);
    }
}

My original code was me assuming I had to write likelihood of the distribution explicitly assuming I could have 1 or more samples x (hence why I had my input be a vector x). I was thinking returning the sum(log(prob)) would account for this issue.

Note that you can also simplify your distribution a little. Given your current likelihood:

\log\left(\lambda*\exp\left(-\lambda*x \right) \right)

You can rearrange to:

\log\left(\lambda\right) + \log\left(\exp\left(-\lambda*x \right) \right)
\log\left(\lambda\right) -\lambda*x

Which you can specify in Stan as:

functions{
    real newexp_lpdf(real x, real lam){
      return log(lam) - lam * x;
    }
}
1 Like