Defining Dirichlet priors

niro · June 8, 2023, 6:08am

Hello,

I am struggling with defining Dirichlet priors in my model. I am getting an error in parameter section in the code. The error is:

Ill-formed expression. Found identifier. There are many ways to complete this to a well-formed expression.

The following is my stan code.

// Create user define function

functions{

  real model_log(matrix dat, matrix theta, matrix alpha){

  vector[rows(dat)] prob;
  real  t;
  real  x;
  real out;
  real a;
  real f;
  real w;
  real r;
  real m;
  

  for (i in 1:rows(dat)){
    
    t <- dat[i, 1];
    x <- dat[i, 2];
    a <- theta[i, 1];
    f <- theta[i, 2];
    w <- alpha[i, 1];
    r <- alpha[i, 2];
    m <- alpha[i, 1];
    
    
      
    if(t == 1){
      prob[i] <- a^x * f^(1 - x);

    }else if(fmod(t, 2) == 0){
      
      prob[i] <- ((1- a - f) * (r)^(t/2.0 - 1.0) * (r)^(t/2.0 - 1.0) * (m)^x * (w)^(1-x)); 
    
    }else {
       prob[i] <- ((1- a - f) * (r)^(t/2.0 - 0.5) * (r)^(t/2.0 - 1.5) * (m)^(1 - x) * (w)^x);
    }
  }

  out <- sum(log(prob));
  return out;

 }
}


// The input data is a vector 'y' of length 'N'.
data {
  int N; // number of observations (rally lengths)
  matrix [N, 2] dat; // touches and indicator 
  int <lower = 2> m; 

}


parameters {
  
     simplex [m] theta_1 [N];
     simplex [m] theta_2 [N];
  
}

// The model to be estimated.
model {
  // Define priors
  // 
  for (n in 1:N){
    theta_1[n] ~ dirichlet(concentration = 1);
  }
  
    for (n in 1:N){
    theta_2[n] ~ dirichlet(concentration = 1);
  }
  

  dat ~ model(theta_1, theta_2);
  
  
  
}

I am new to stan and can I get a help on this to understand how stan works on simplex type data. I think my error comes in defining the parameter types.

Thank you!

Edited by @jsocolar for syntax highlighting.

jsocolar · June 8, 2023, 2:03pm

looks like dat ~ model(... should be dat ~ model_log(.... In that case, you’ll also need to supply the third input to the model_log function.

niro · June 8, 2023, 2:40pm

Thanks for the response. I tried that. But still getting the same error. Also, the line number of the error comes from parameter block. That is why I thought the way I defined the simplex is wrong. I am not sure.

jsocolar · June 8, 2023, 3:03pm

Providing the entire error message, including the line number, would be useful.

niro · June 8, 2023, 3:04pm

The error message is:

Error in stanc(file = file, model_code = model_code, model_name = model_name, :
0

Syntax error in ‘string’, line 65, column 41 to column 42, parsing error:

Ill-formed expression. Found identifier. There are many ways to complete this to a well-formed expression.

In my script line number 65 comes under parameter box.

As you suggested I added the changes to model box as follows:

// The model to be estimated.
model {
// Define priors

for (n in 1:N){
theta_1[n] ~ dirichlet(concentration = 1);
}

for (n in 1:N){
theta_2[n] ~ dirichlet(concentration = 1);

}

dat ~ model_log(dat, theta_1, theta_2);

}

I hope this is what you meant.

nhuurre · June 8, 2023, 5:38pm

Stan does not have named function arguments, you can’t write dirichlet(concentration = 1) (and in the documentation the argument to dirichlet is called alpha, not concentration, not that it matters…)
The correct thing to do is

for (n in 1:N){
  theta_1[n] ~ dirichlet(rep_vector(1, m));
}
for (n in 1:N){
  theta_2[n] ~ dirichlet(rep_vector(1, m));
}

No, tilde statements expect the function name without a suffix.
Omit the suffix unless you’re writing target += ....

Also, the _log suffix is deprecated, you should probably use _lpdf instead.

functions {
  real model_lpdf(matrix dat, matrix theta, matrix alpha) {
  ...
}
model {
  ...
  dat ~ model(theta_1, theta_2);
}

niro · June 8, 2023, 5:47pm

Thanks for the suggesion in the parameter block. Once I changed it the error is not there. However, I changed the model as you suggested.

functions{

real tenismodel_lpdf(matrix dat, matrix theta, matrix alpha){

…
}

model{

…

dat ~ tenismodel(theta_1, theta_2);

}

But now I am getting an error:

Error in stanc(file = file, model_code = model_code, model_name = model_name, :
0

Semantic error in ‘string’, line 71, column 2 to column 37:

Ill-typed arguments to ‘~’ statement. No distribution ‘tenismodel’ was found with the correct signature.

(I changed the function name as tensimodel)

jsocolar · June 8, 2023, 5:49pm

Oh duh. Don’t mind me y’all! Pay attention to @nhuurre instead! I forgot that _log can replace _lpdf and was careless in my assessment of the problem!

niro · June 8, 2023, 5:50pm

No worries. Thanks for the suggestions. I am really new to stan and struggling to fit the model.

nhuurre · June 8, 2023, 6:04pm

Stan is very strict about argument types. You declare the function with matrix arguments but theta_1 and theta_2 are arrays of vectors. Arrays and matrixes are considered different types even though in practice they are very similar.
The model compiles for me after changing the argument types from matrix to vector[] (that [] means array).

functions{
  real tenismodel_lpdf(matrix dat, vector[] theta, vector[] alpha){
  ...

niro · June 8, 2023, 6:15pm

Thanks for this explanation. This is something that I wanted to know. I think now my code is working.

How can we get to know what is the type of the variable?. As an example here how do we know theta_1 and theta_2 are arrays?. I searched in stan guide. I could not find the variable types that output from Dirichlet distribution. I thought is is a matrix.

niro · June 8, 2023, 6:19pm

Further to that, I forgot to ask why did you suggest to change tenismodel_lpdf instead tenismodel_log?

nhuurre · June 8, 2023, 6:30pm

Variable types can be seen in the declaration

parameters {
    simplex [m] theta_1 [N];
    simplex [m] theta_2 [N];
}

simplex is a (constrained) vector and that [N] at the end means N-element array.

Dirichlet distribution is described in the functions reference, the expected type is vector.

_log vs _lpdf doesn’t really matter but the latter is clearer that it’s a probability density function, not just any logarithmic function.

niro · June 8, 2023, 6:36pm

Thanks for the explanation.

In my function, the output is sum of log probabilities.

functions{

real tenismodel_lpdf(matrix dat, vector theta, vector alpha){

vector[rows(dat)] prob;
real t;
real x;
real out;
real a;
real f;
real w;
real r;
real m;

for (i in 1:rows(dat)){

t <- dat[i, 1];
x <- dat[i, 2];
a <- theta[i, 1];
f <- theta[i, 2];
w <- alpha[i, 1];
r <- alpha[i, 2];
m <- alpha[i, 1];


  
if(t == 1){
  prob[i] <- a^x * f^(1 - x);

}else if(fmod(t, 2) == 0){
  
  prob[i] <- ((1- a - f) * (r)^(t/2.0 - 1.0) * (r)^(t/2.0 - 1.0) * (m)^x * (w)^(1-x)); 

}else {
   prob[i] <- ((1- a - f) * (r)^(t/2.0 - 0.5) * (r)^(t/2.0 - 1.5) * (m)^(1 - x) * (w)^x);
}

}

out ← sum(log(prob));
return out;

}
}

Then is not my function is a logarithmic function?

nhuurre · June 8, 2023, 6:46pm

Yes it is. And more specifically it is a logarithmic probability density function. Tilde ~ statements compute probabilities. As I said the distinction between _log and _lpdf doesn’t really matter even though the latter is recommended.

niro · June 8, 2023, 7:13pm

Thank you very much all these explanations. I think I have a little idea about how stan works now. I really appreciate your help.

Topic		Replies	Views
Define Dirichlet parameters in advance settings Modeling dirichlet-multinomial	33	948	August 17, 2023
Dirichlet distribution example not working Modeling	3	695	June 28, 2022
Implementing Logit-Normal priors on Dirichlet-Multinomial model Modeling	1	782	January 3, 2022
Dirichlet prior possible with stanvars? brms	5	983	May 25, 2021
Simplex Regression Modeling	11	2622	December 14, 2017

Defining Dirichlet priors

Related topics