Custom Distribution compiling error message "No Matches for..."

MattsBusride · November 30, 2021, 9:05pm

Hi I am trying to create my own custom distribution function in PyStan (version = 2.19.1.1). This is just a toy example for me to learn how to make custom distribution functions, so I didn’t put too much thought into choice of prior etc. I feel like it should be simple, but I keep getting an error when I try to compile. The error I see is as follows:

ValueError: Failed to parse Stan model ‘anon_model_bf95be6829e9dc127f6cab4591949241’. Error message:
SYNTAX ERROR, MESSAGE(S) FROM PARSER:
*No matches for: *

real ~ newexp(real)*

Available argument signatures for newexp:

vector ~ newexp(real)*

Real return type required for probability function.

I believe I should be returning a Real at the end of my custom function… but maybe i just don’t understand what is actually happening. Can anyone help?

Code used to generate a toy dataset:

import numpy as np
import scipy.stats as stats

#Hyperparams:
lmbda = [0.01,0.1,1]
num_states = len(lmbda) #total number of states our model will have

#Parameters
toy_scale = [];
for l in lmbda:
  #draw a random value from exponential to determine states' scale parameter
  toy_scale.append(stats.expon.rvs(loc = 0, scale = 1/l)) 
toy_shape = [0.5,0.5,0.5] #all the shapes will be assumed the same (0.5)

#Generate a toy dataset and state indicators
toy_data = []; toy_data_state = []; #data and state indicator (respectively)
len_state = 100 #number of data points belonging to each state
for i in range(num_states):
  #each states parameters
  temp_shape = toy_shape[i] #shape of weibull
  temp_scale = toy_scale[i] #scale of weibull
  
  #generate a sequence of random variables drawn from the weibull
  toy_data.extend(stats.weibull_min.rvs(temp_shape,loc = 0, scale = temp_scale,size = len_state))
  
  #label the states
  toy_data_state.extend(np.repeat(i,len_state))

#Convert from list to array (to put into STAN)
toy_data = np.array(toy_data) 
toy_data_state = np.array(toy_data_state)

Code used for Pystan modeling (gives me error message)

#Pystan modeling
example_code = """
functions{
    real newexp_lpdf(vector x, real lam){
      vector[num_elements(x)] prob;
      real lprob;
      for (i in 1:num_elements(x)){
        prob[i] = lam*exp(-lam * x[i]);
      }
      lprob = sum(log(prob));
      return(lprob);
    }
}
data {
    int<lower=0> N; // number of observations
    int<lower=0> K; // number of keys
    int<lower = 0, upper = K> state_id[N]; // data's group membership based on state
    vector<lower=0>[N] y; // observed data
    vector<lower=0>[K] lam1; //known hyperparam
}
parameters {
    vector<lower=0>[K] scale;
}
model {
    for (k in 1:K){
        scale[k] ~ newexp(lam1[k]);//hyper parameter depends on the group
    }
    for (n in 1:N){
      y[n] ~ weibull(0.5, scale[state_id[n]]); //likelihood depends on the group (state_id)
    }
}
"""
example_dat = {'N': len(toy_data),
               'K': num_states,
               'state_id': toy_data_state + 1, #note: we +1 here because actually Stan indexes beginning at 1, while python indexes beginning at 0
               'y': toy_data, 
                'lam1': lmbda} 

sm = pystan.StanModel(model_code=example_code)

Additional information:

Operating System: Windows 10
Python Version: 3.7.5
PyStan Version: 2.19.1.1
Compiler/Toolkit: Databricks

andrjohns · December 1, 2021, 12:06am

Your function specifies that newexp should take a vector and a real as inputs, but you’re passing two reals. This is a little easier to see if you use the target += notation.

So your current specification:

    for (k in 1:K){
        scale[k] ~ newexp(lam1[k]);//hyper parameter depends on the group
    }

Is equivalent to:

    for (k in 1:K){
      target += newexp_lpdf(scale[k], lam1[k]);//hyper parameter depends on the group
    }

Where you can see that you’re passing two reals as inputs

MattsBusride · December 1, 2021, 2:29am

Ah it is very clear that way. So the correct way for me to write my custom version of the exponential distribution is like this:

functions{
    real newexp_lpdf(real x, real lam){
      real prob;
      real lprob;
      prob = lam*exp(-lam * x);
      lprob = log(prob);
      return(lprob);
    }
}

My original code was me assuming I had to write likelihood of the distribution explicitly assuming I could have 1 or more samples x (hence why I had my input be a vector x). I was thinking returning the sum(log(prob)) would account for this issue.

andrjohns · December 1, 2021, 2:48am

Note that you can also simplify your distribution a little. Given your current likelihood:

\log\left(\lambda*\exp\left(-\lambda*x \right) \right)

You can rearrange to:

\log\left(\lambda\right) + \log\left(\exp\left(-\lambda*x \right) \right)

\log\left(\lambda\right) -\lambda*x

Which you can specify in Stan as:

functions{
    real newexp_lpdf(real x, real lam){
      return log(lam) - lam * x;
    }
}

Topic		Replies	Views
SYNTAX ERROR, MESSAGE(S) FROM PARSER: No matches for: Modeling	4	510	April 6, 2022
Using a custom likelihood function in pystan Modeling fitting-issues	6	1556	January 24, 2020
Syntax error for custom functions [error:No matches...] Modeling	3	1384	November 22, 2019
Trying to compile the integrate_1d example fails Modeling	17	1106	September 23, 2019
macOS Big Sur 11.4 Compiling Error for PyStan PyStan	2	508	June 19, 2021

Custom Distribution compiling error message "No Matches for..."

Related topics