# Defining Dirichlet priors

Hello,

I am struggling with defining Dirichlet priors in my model. I am getting an error in parameter section in the code. The error is:

Ill-formed expression. Found identifier. There are many ways to complete this to a well-formed expression.

The following is my stan code.

``````// Create user define function

functions{

real model_log(matrix dat, matrix theta, matrix alpha){

vector[rows(dat)] prob;
real  t;
real  x;
real out;
real a;
real f;
real w;
real r;
real m;

for (i in 1:rows(dat)){

t <- dat[i, 1];
x <- dat[i, 2];
a <- theta[i, 1];
f <- theta[i, 2];
w <- alpha[i, 1];
r <- alpha[i, 2];
m <- alpha[i, 1];

if(t == 1){
prob[i] <- a^x * f^(1 - x);

}else if(fmod(t, 2) == 0){

prob[i] <- ((1- a - f) * (r)^(t/2.0 - 1.0) * (r)^(t/2.0 - 1.0) * (m)^x * (w)^(1-x));

}else {
prob[i] <- ((1- a - f) * (r)^(t/2.0 - 0.5) * (r)^(t/2.0 - 1.5) * (m)^(1 - x) * (w)^x);
}
}

out <- sum(log(prob));
return out;

}
}

// The input data is a vector 'y' of length 'N'.
data {
int N; // number of observations (rally lengths)
matrix [N, 2] dat; // touches and indicator
int <lower = 2> m;

}

parameters {

simplex [m] theta_1 [N];
simplex [m] theta_2 [N];

}

// The model to be estimated.
model {
// Define priors
//
for (n in 1:N){
theta_1[n] ~ dirichlet(concentration = 1);
}

for (n in 1:N){
theta_2[n] ~ dirichlet(concentration = 1);
}

dat ~ model(theta_1, theta_2);

}
``````

I am new to stan and can I get a help on this to understand how stan works on simplex type data. I think my error comes in defining the parameter types.

Thank you!

Edited by @jsocolar for syntax highlighting.

looks like `dat ~ model(...` should be `dat ~ model_log(...`. In that case, you’ll also need to supply the third input to the `model_log` function.

1 Like

Thanks for the response. I tried that. But still getting the same error. Also, the line number of the error comes from parameter block. That is why I thought the way I defined the simplex is wrong. I am not sure.

Providing the entire error message, including the line number, would be useful.

1 Like

The error message is:

Error in stanc(file = file, model_code = model_code, model_name = model_name, :
0

Syntax error in ‘string’, line 65, column 41 to column 42, parsing error:

Ill-formed expression. Found identifier. There are many ways to complete this to a well-formed expression.

In my script line number 65 comes under parameter box.

As you suggested I added the changes to model box as follows:

// The model to be estimated.
model {
// Define priors

for (n in 1:N){
theta_1[n] ~ dirichlet(concentration = 1);
}

``````for (n in 1:N){
theta_2[n] ~ dirichlet(concentration = 1);
``````

}

dat ~ model_log(dat, theta_1, theta_2);

}

I hope this is what you meant.

Stan does not have named function arguments, you can’t write `dirichlet(concentration = 1)` (and in the documentation the argument to `dirichlet` is called `alpha`, not `concentration`, not that it matters…)
The correct thing to do is

``````for (n in 1:N){
theta_1[n] ~ dirichlet(rep_vector(1, m));
}
for (n in 1:N){
theta_2[n] ~ dirichlet(rep_vector(1, m));
}
``````

No, tilde statements expect the function name without a suffix.
Omit the suffix unless you’re writing `target += ...`.

Also, the `_log` suffix is deprecated, you should probably use `_lpdf` instead.

``````functions {
real model_lpdf(matrix dat, matrix theta, matrix alpha) {
...
}
model {
...
dat ~ model(theta_1, theta_2);
}
``````
2 Likes

Thanks for the suggesion in the parameter block. Once I changed it the error is not there. However, I changed the model as you suggested.

functions{

real tenismodel_lpdf(matrix dat, matrix theta, matrix alpha){

}

model{

dat ~ tenismodel(theta_1, theta_2);

}

But now I am getting an error:

Error in stanc(file = file, model_code = model_code, model_name = model_name, :
0

Semantic error in ‘string’, line 71, column 2 to column 37:

Ill-typed arguments to ‘~’ statement. No distribution ‘tenismodel’ was found with the correct signature.

(I changed the function name as tensimodel)

Oh duh. Don’t mind me y’all! Pay attention to @nhuurre instead! I forgot that `_log` can replace `_lpdf` and was careless in my assessment of the problem!

1 Like

No worries. Thanks for the suggestions. I am really new to stan and struggling to fit the model.

Stan is very strict about argument types. You declare the function with `matrix` arguments but `theta_1` and `theta_2` are arrays of vectors. Arrays and matrixes are considered different types even though in practice they are very similar.
The model compiles for me after changing the argument types from `matrix` to `vector[]` (that `[]` means array).

``````functions{
real tenismodel_lpdf(matrix dat, vector[] theta, vector[] alpha){
...
``````
1 Like

Thanks for this explanation. This is something that I wanted to know. I think now my code is working.

How can we get to know what is the type of the variable?. As an example here how do we know theta_1 and theta_2 are arrays?. I searched in stan guide. I could not find the variable types that output from Dirichlet distribution. I thought is is a matrix.

Further to that, I forgot to ask why did you suggest to change tenismodel_lpdf instead tenismodel_log?

Variable types can be seen in the declaration

``````parameters {
simplex [m] theta_1 [N];
simplex [m] theta_2 [N];
}
``````

simplex is a (constrained) vector and that `[N]` at the end means N-element array.

Dirichlet distribution is described in the functions reference, the expected type is `vector`.

`_log` vs `_lpdf` doesn’t really matter but the latter is clearer that it’s a probability density function, not just any logarithmic function.

1 Like

Thanks for the explanation.

In my function, the output is sum of log probabilities.

functions{

real tenismodel_lpdf(matrix dat, vector theta, vector alpha){

vector[rows(dat)] prob;
real t;
real x;
real out;
real a;
real f;
real w;
real r;
real m;

for (i in 1:rows(dat)){

``````t <- dat[i, 1];
x <- dat[i, 2];
a <- theta[i, 1];
f <- theta[i, 2];
w <- alpha[i, 1];
r <- alpha[i, 2];
m <- alpha[i, 1];

if(t == 1){
prob[i] <- a^x * f^(1 - x);

}else if(fmod(t, 2) == 0){

prob[i] <- ((1- a - f) * (r)^(t/2.0 - 1.0) * (r)^(t/2.0 - 1.0) * (m)^x * (w)^(1-x));

}else {
prob[i] <- ((1- a - f) * (r)^(t/2.0 - 0.5) * (r)^(t/2.0 - 1.5) * (m)^(1 - x) * (w)^x);
}
``````

}

out ← sum(log(prob));
return out;

}
}

Then is not my function is a logarithmic function?

Yes it is. And more specifically it is a logarithmic probability density function. Tilde `~` statements compute probabilities. As I said the distinction between `_log` and `_lpdf` doesn’t really matter even though the latter is recommended.

1 Like

Thank you very much all these explanations. I think I have a little idea about how stan works now. I really appreciate your help.