Syntax error for custom functions [error:No matches...]

Hi, I’m very new to Stan programming in R and have a question about an error I keep getting. I have three binary variables, MS, W, and Y. I wrote a function for the log joint likelihood for this data, but when I try to run the function stan_model I keep encountering this error:

SYNTAX ERROR, MESSAGE(S) FROM PARSER:
No matches for:

real ~ realprobyw(vector, vector, vector, vector, vector, vector)

Available argument signatures for realprobyw:

vector ~ realprobyw(vector, vector, vector, vector, vector)

Real return type required for probability function.
error in ‘model20fc27c45d5d_2by2’ at line 80, column 42

I’m not quite sure what this means, as I specified “lik” in the model block as real and in my function. Also, I’m not sure why the number of variables decreases in the available section. Thanks for your help.

Stan Code:

functions {
real realprobyw_log(vector ms,vector w, vector y,vector b,vector e, vector g) {
  
  real alpha21=0.5/(1+exp(-e[1]));
  real alpha22=0.5/(1+exp(-e[2]))+0.5;
  real alpha11=1-alpha21;
  real alpha12=1-alpha22;
  
  vector[num_elements(ms)] pm2;
  vector[num_elements(ms)] pm1;
  
  vector[num_elements(ms)] pw21;
  vector[num_elements(ms)] pw22;
  vector[num_elements(ms)] pw11;
  vector[num_elements(ms)] pw12;
  
  vector[num_elements(ms)] py21;
  vector[num_elements(ms)] py22;
  vector[num_elements(ms)] py11;
  vector[num_elements(ms)] py12;
  
  vector[num_elements(ms)] prob; 
  real jointlik;
  
  //begin calculations
  //loop for calculating probabilities for m, w, and y
  for (i in 1:num_elements(ms)){
  pm2[i]=exp(g[1]+g[2]*ms[i])/(1+exp(g[1]+g[2]*ms[i]));
  pm1[i]=1-pm2[i];
  py21[i]=exp(b[1])/(1+exp(b[1]));
  py22[i]=exp(b[1]+b[2])/(1+exp(b[1]+b[2]));
  py11[i]=1-py21[i];
  py12[i]=1-py22[i];
  }
  
  //loop for calculating joint probability
  for (i in 1:num_elements(ms)){
  prob[i] = y[i]*(w[i]*log(py11[i]*alpha11*pm1[i]+py12[i]*alpha12*pm2[i])+(1-w[i])*log(py11[i]*alpha21*pm1[i]+py12[i]*alpha22*pm2[i]))+(1-y[i])*(w[i]*log(1-(py11[i]*alpha11*pm1[i]+py12[i]*alpha12*pm2[i]))+(1-w[i])*log(1-(py11[i]*alpha21*pm1[i]+py12[i]*alpha22*pm2[i])))+normal_lpdf(b[1]|0,10)+normal_lpdf(b[2]|0,10)+normal_lpdf(e[1]|0,10)+normal_lpdf(e[2]|0,10)+normal_lpdf(g[1]|0,10)+normal_lpdf(g[2]|0,10);
  }
    jointlik=sum(prob);
    return(jointlik);
  }
}

data {                      // Data block
  int<lower=1> N;           // Sample size
  vector[N] ms;
  vector[N] w;
  vector[N] y;              // Target variable
}

transformed data {          // Transformed data block.
} 

parameters {                // Parameters block
  vector[2] beta;           // Coefficient vector
  vector[2] gamma;
  vector[2] eta;
}

transformed parameters {    // Transformed parameters block.
}

model {         // Model block
  real lik;
   // priors
   for (i in 1:2){
  beta[i] ~ normal(0, 10);
  gamma[i] ~ normal(0, 10);
  eta[i] ~ normal(0,10);
   }

  // likelihood
  //for (i in 1:N){
  lik ~ realprobyw(ms,w,y,beta,eta,gamma);
  //}
}

generated quantities {      // Generated quantities block. 
}

If you changed:

lik ~ realprobyw(ms, w, y, beta, eta, gamma);

to:

ms ~ realprobyw(w, y, beta, eta, gamma);

or

target += realprobyw_log(ms | w, y, beta, eta, gamma);

It would work.

Just keep in mind when you say:

a ~ normal(b, c)

That translates directly to:

target += normal_lpdf(a | b, c);

Which would just be normal_lpdf(a, b, c) if it were any other function. The thing on the left side of the ~ is actually the first argument to the function, not the return type. The return type for an lpdf/lpmf is always a real, and the ~ notation hides what’s happening with it.

You don’t need to wrap the probability calculation in a function. You can just increment target directly.

Another thing I love about Stan is that you can have local variables and control flow–you can write the code like it was meant to be read by humans.

Based on your description this is the model I believe you’re trying to code

data {                      // Data block
  int<lower=1> N;           // Sample size
  int<lower=0,upper=1> ms[N];
  int<lower=0,upper=1> w[N];
  int<lower=0,upper=1> y[N];// Target variable
}

parameters {                // Parameters block
  vector[2] beta;           // Coefficient vector
  vector[2] gamma;
  vector[2] eta;
}

model {         // Model block
  real alpha21 = 0.5*inv_logit(eta[1]);
  real alpha22 = 0.5*inv_logit(eta[2]) + 0.5;
  real alpha11 = 1-alpha21;
  real alpha12 = 1-alpha22;

  real py21 = inv_logit(beta[1]);
  real py22 = inv_logit(beta[1] + beta[2]);
  real py11 = 1-py21;
  real py12 = 1-py22;

  // likelihood
  for (i in 1:N) {
    real pm2 = inv_logit(gamma[1] + gamma[2]*ms[i]);
    real pm1 = 1-pm2;
    if (w[i] == 1)
      y[i] ~ bernoulli(py11*alpha11*pm1 + py12*alpha12*pm2);
    else
      y[i] ~ bernoulli(py11*alpha21*pm1 + py12*alpha22*pm2);
  }

  // priors
  for (i in 1:2){
    beta[i] ~ normal(0, 10);
    gamma[i] ~ normal(0, 10);
    eta[i] ~ normal(0,10);
  }
}

IMO, much easier on the eyes.

1 Like

Thank you both for the clarification - this simplifies things alot!

1 Like