Defining Dirichlet priors

Hello,

I am struggling with defining Dirichlet priors in my model. I am getting an error in parameter section in the code. The error is:

Ill-formed expression. Found identifier. There are many ways to complete this to a well-formed expression.

The following is my stan code.

// Create user define function

functions{

  real model_log(matrix dat, matrix theta, matrix alpha){

  vector[rows(dat)] prob;
  real  t;
  real  x;
  real out;
  real a;
  real f;
  real w;
  real r;
  real m;
  

  for (i in 1:rows(dat)){
    
    t <- dat[i, 1];
    x <- dat[i, 2];
    a <- theta[i, 1];
    f <- theta[i, 2];
    w <- alpha[i, 1];
    r <- alpha[i, 2];
    m <- alpha[i, 1];
    
    
      
    if(t == 1){
      prob[i] <- a^x * f^(1 - x);

    }else if(fmod(t, 2) == 0){
      
      prob[i] <- ((1- a - f) * (r)^(t/2.0 - 1.0) * (r)^(t/2.0 - 1.0) * (m)^x * (w)^(1-x)); 
    
    }else {
       prob[i] <- ((1- a - f) * (r)^(t/2.0 - 0.5) * (r)^(t/2.0 - 1.5) * (m)^(1 - x) * (w)^x);
    }
  }

  out <- sum(log(prob));
  return out;

 }
}


// The input data is a vector 'y' of length 'N'.
data {
  int N; // number of observations (rally lengths)
  matrix [N, 2] dat; // touches and indicator 
  int <lower = 2> m; 

}


parameters {
  
     simplex [m] theta_1 [N];
     simplex [m] theta_2 [N];
  
}

// The model to be estimated.
model {
  // Define priors
  // 
  for (n in 1:N){
    theta_1[n] ~ dirichlet(concentration = 1);
  }
  
    for (n in 1:N){
    theta_2[n] ~ dirichlet(concentration = 1);
  }
  

  dat ~ model(theta_1, theta_2);
  
  
  
}

I am new to stan and can I get a help on this to understand how stan works on simplex type data. I think my error comes in defining the parameter types.

Thank you!

Edited by @jsocolar for syntax highlighting.

looks like dat ~ model(... should be dat ~ model_log(.... In that case, you’ll also need to supply the third input to the model_log function.

1 Like

Thanks for the response. I tried that. But still getting the same error. Also, the line number of the error comes from parameter block. That is why I thought the way I defined the simplex is wrong. I am not sure.

Providing the entire error message, including the line number, would be useful.

1 Like

The error message is:

Error in stanc(file = file, model_code = model_code, model_name = model_name, :
0

Syntax error in ‘string’, line 65, column 41 to column 42, parsing error:

Ill-formed expression. Found identifier. There are many ways to complete this to a well-formed expression.

In my script line number 65 comes under parameter box.

As you suggested I added the changes to model box as follows:

// The model to be estimated.
model {
// Define priors

for (n in 1:N){
theta_1[n] ~ dirichlet(concentration = 1);
}

for (n in 1:N){
theta_2[n] ~ dirichlet(concentration = 1);

}

dat ~ model_log(dat, theta_1, theta_2);

}

I hope this is what you meant.

Stan does not have named function arguments, you can’t write dirichlet(concentration = 1) (and in the documentation the argument to dirichlet is called alpha, not concentration, not that it matters…)
The correct thing to do is

for (n in 1:N){
  theta_1[n] ~ dirichlet(rep_vector(1, m));
}
for (n in 1:N){
  theta_2[n] ~ dirichlet(rep_vector(1, m));
}

No, tilde statements expect the function name without a suffix.
Omit the suffix unless you’re writing target += ....

Also, the _log suffix is deprecated, you should probably use _lpdf instead.

functions {
  real model_lpdf(matrix dat, matrix theta, matrix alpha) {
  ...
}
model {
  ...
  dat ~ model(theta_1, theta_2);
}
2 Likes

Thanks for the suggesion in the parameter block. Once I changed it the error is not there. However, I changed the model as you suggested.

functions{

real tenismodel_lpdf(matrix dat, matrix theta, matrix alpha){


}

model{

dat ~ tenismodel(theta_1, theta_2);

}

But now I am getting an error:

Error in stanc(file = file, model_code = model_code, model_name = model_name, :
0

Semantic error in ‘string’, line 71, column 2 to column 37:

Ill-typed arguments to ‘~’ statement. No distribution ‘tenismodel’ was found with the correct signature.

(I changed the function name as tensimodel)

Oh duh. Don’t mind me y’all! Pay attention to @nhuurre instead! I forgot that _log can replace _lpdf and was careless in my assessment of the problem!

1 Like

No worries. Thanks for the suggestions. I am really new to stan and struggling to fit the model.

Stan is very strict about argument types. You declare the function with matrix arguments but theta_1 and theta_2 are arrays of vectors. Arrays and matrixes are considered different types even though in practice they are very similar.
The model compiles for me after changing the argument types from matrix to vector[] (that [] means array).

functions{
  real tenismodel_lpdf(matrix dat, vector[] theta, vector[] alpha){
  ...
1 Like

Thanks for this explanation. This is something that I wanted to know. I think now my code is working.

How can we get to know what is the type of the variable?. As an example here how do we know theta_1 and theta_2 are arrays?. I searched in stan guide. I could not find the variable types that output from Dirichlet distribution. I thought is is a matrix.

Further to that, I forgot to ask why did you suggest to change tenismodel_lpdf instead tenismodel_log?

Variable types can be seen in the declaration

parameters {
    simplex [m] theta_1 [N];
    simplex [m] theta_2 [N];
}

simplex is a (constrained) vector and that [N] at the end means N-element array.

Dirichlet distribution is described in the functions reference, the expected type is vector.

_log vs _lpdf doesn’t really matter but the latter is clearer that it’s a probability density function, not just any logarithmic function.

1 Like

Thanks for the explanation.

In my function, the output is sum of log probabilities.

functions{

real tenismodel_lpdf(matrix dat, vector theta, vector alpha){

vector[rows(dat)] prob;
real t;
real x;
real out;
real a;
real f;
real w;
real r;
real m;

for (i in 1:rows(dat)){

t <- dat[i, 1];
x <- dat[i, 2];
a <- theta[i, 1];
f <- theta[i, 2];
w <- alpha[i, 1];
r <- alpha[i, 2];
m <- alpha[i, 1];


  
if(t == 1){
  prob[i] <- a^x * f^(1 - x);

}else if(fmod(t, 2) == 0){
  
  prob[i] <- ((1- a - f) * (r)^(t/2.0 - 1.0) * (r)^(t/2.0 - 1.0) * (m)^x * (w)^(1-x)); 

}else {
   prob[i] <- ((1- a - f) * (r)^(t/2.0 - 0.5) * (r)^(t/2.0 - 1.5) * (m)^(1 - x) * (w)^x);
}

}

out ← sum(log(prob));
return out;

}
}

Then is not my function is a logarithmic function?

Yes it is. And more specifically it is a logarithmic probability density function. Tilde ~ statements compute probabilities. As I said the distinction between _log and _lpdf doesn’t really matter even though the latter is recommended.

1 Like

Thank you very much all these explanations. I think I have a little idea about how stan works now. I really appreciate your help.