How to declare a variable in model block

Hi, everyone, I am a newbie to Stan, and the following is my code, and I got the error.

data{
int T;
int N;
vector[N] score;
matrix[N, T] A;
matrix[N, T] y;

}
parameters{
real beta;
vector[T] d;

}
model {
for(j in 1:T) {
d[j] ~ gamma(1 ,1);
   for (i in 1:N) {
p[i,j] = A[i,j] *exp(beta* score[i] )*d[j];
y[i,j] ~ poisson((p[i,j]);  
}}
} 

SYNTAX ERROR, MESSAGE(S) FROM PARSER:

Unknown variable in assignment; lhs variable=p.

I tried to declare p in the model block, and still got the error.

model {
matrix[N, T] p;
for(j in 1:T) {
d[j] ~ gamma(1 ,1);
   for (i in 1:N) {
p[i,j] = A[i,j] *exp(beta* score[i] )*d[j];
y[i,j] ~ poisson((p[i,j]);  
}}

SYNTAX ERROR, MESSAGE(S) FROM PARSER:

variable “matrix” does not exist.

Anyone knows how to fix it?
Thanks!

You are missing a semi-colon after gamma(1,1). Also, it does not work to evaluate the Poisson likelihood at y when y is a matrix. It can only be int or int array. In this case, you probably meant int<lower=0> y[N,T];, although you only need to loop over the first dimension and can evaluate the likelihood T observations at a time. Finally, you should use the poisson_log form of the distribution so as to avoid having to do exp(beta * score[i]), which is prone to overflow.

Hi, bgoodri, Thanks for your help. I revised my code according to your suggestions.

data{
int T;
int N;
vector[N] score;
matrix[N, T] A;
int<lower=0> y[N,T];

}
parameters{
real beta;
vector[T] d;
}
model {
for(j in 1:T) {
d[j] ~ gamma(1 ,1);
   for (i in 1:N) {
p[i,j] = log(A[i,j]) +beta* score[i] +d[j];
y[i,j] ~ poisson_log((p[i,j]);  
}}
} 

I still got the error that p is the unknown variable.
besides, could you explain more about looping over the first dimension ?

Yes, because p is not declared. You can just do

y[i,j] ~ poisson_log(log(A[i,j]) +beta* score[i] +d[j]);

And you can do one look with

for (i in 1:N)
  y[i] ~ poisson_log(log_A[i, ] + beta * score[i] + d);

but you should pass the log of A in the data block rather than A. Also, you can do the prior on d outside the loop with d ~ gamma(1, 1) (which is equivalent to exponential(1)).

Thanks, bgoodri, for all those useful suggestions.
I fixed the error and finally got to compile the model, but I got this error message.

Error in compileCode(f, code, language = language, verbose = verbose) :
Compilation ERROR, function(s)/method(s) not created! Warning message:
running command ‘make -f “C:/PROGRA~1/R/R-34~1.2/etc/x64/Makeconf” -f “C:/PROGRA~1/R/R-34~1.2/share/make/winshlib.mk” SHLIB_LDFLAGS=’(SHLIB_CXXLDFLAGS)' SHLIB_LD='(SHLIB_CXXLD)’ SHLIB=“file16e466222198.dll” WIN=64 TCLBIN=64 OBJECTS=“file16e466222198.o”’ had status 127
In addition: Warning messages:
1: running command ‘“C:/PROGRA~1/R/R-34~1.2/bin//R” CMD config CXX’ had status 1
2: running command ‘C:/PROGRA~1/R/R-34~1.2/bin/x64/R CMD SHLIB file16e466222198.cpp 2> file16e466222198.cpp.err.txt’ had status 1

do you have any idea what is wrong?

Thanks so much!

By the way, you said I can put the gamma(1, 1) out of the loop, but will it be the same?

Thanks again.

Your Rtools is either not installed or configured correctly.

Yes, d ~ exponential(1) is equivalent to but faster than

for (j in 1:T) {
  d[j] ~ gamma(1, 1);
  ...
}

I mean, if you put it outside the loop, will “d” be sampling NT times with 1 sample each time, or just one time with NT of samplings? will it be the same with it being put inside the loop?
Also, can you add a vector with a number in stan?
Thanks

Stan never samples individual parameters. If you put it inside a loop over T and increment the log-kernel once for each scalar that is literally equivalent to putting it outside the loop and incrementing the log-kernel once for the entire vector.

And yes, adding a scalar to a vector is a legal operation in the Stan language.

Thanks for your quick reply.
Just curious, have you ever used Jags? If yes ,how do you compare their speeds? Is Stan better in some ways ?

I haven’t used BUGS since 2003 but it isn’t much of a comparison if you do the comparison correctly. Comparing wall time until something finishes is useless because you can easily make one or the other finish in whatever amount of time you want. It is also useless to compare MCMC algorithms unless both converge to the posterior distribution, which Gibbs samplers such as BUGS and JAGS have difficulty doing in examples with more than a few parameters or without choosing particular sets of distributions for the priors and likelihood. In models where BUGS / JAGS converges and mixes well, you can compare effective sample size per second of wall time with Stan and Stan will usually be at least as good.

Hi, sorry that I keep bugging you with questions. Now I want to make a more complicated model, and revised my code as follows. something went wrong with the data “trial”. The model can be complied but when I apply my data, it showed error message. My data should be good, the variable “trial” only contains 1,2,3,4,5,6,7,8.

data{
int T;
int N;
vector[N] score;
matrix[N, T] A;
int trial[N];
int<lower=0> y1[N,T];
int<lower=0> y2[N,T];
}
transformed data{
matrix[N,T] log_A;
log_A = log(A);
}
parameters{
matrix[1,2] beta;
vector[T] d;
matrix[2,8] ran_trial;
vector[2] s_trial;
}
transformed parameters{
vector[T] log_d;
log_d= log(d)
}
model {
for (o in 1:2){
beta[1,o] ~ normal(0,1)
s_trial[o] ~ student_t(1,0,1)
for(tn in 1:8){
ran_trial[o, tn] ~ normal(0,s_trial[o]}
}}
d ~ exponential(1);
for(j in 1:T) {
   for (i in 1:N) {
y1[i,j] ~ poisson_log(log_A[i,j] +beta[1,1]* score[i] +ran_trial[1,trial[i]]+log_d);
y2[i,j] ~ poisson_log(log_A[i,j] +beta[1,2]* score[i] +ran_trial[2,trial[i]]+log_d);
}}
} 

Error in new_CppObject_xp(fields$.module, fields$.pointer, …) :
Exception: variable does not exist; processing stage=data initialization; variable name=trial; base type=int (in ‘model29045b845223_104d2d1efc277c900643b11430b3da1e’ at line 14)

In addition: Warning message:
In FUN(X[[i]], …) : data with name trial is not numeric and not used
failed to create the sampler; sampling not done


Besides, when I complied the model, it showed warning:
Warning message:
In if (nchar(CXX) > 0) return(TRUE) :
the condition has length > 1 and only the first element will be used

Is it normal?

Offhand, it looks like the data for the variable trial was not input correctly. Are you using RStan? If so, what does the R script that you’re using look like?

Oh I figured out, the trial variable in my data is not numeric. so I put as.numeric(), it worked! THANKS


transformed data{
matrix[N,T] log_A;
log_A = log(A);
}

transformed parameters{
vector[T] log_d;
log_d= log(d)
}

Am I doing it right for the log() as A is a matrix and d is a vector?

Dimension mismatch in assignment; variable name = log_A, num dimensions given = 0; right-hand side dimensions = 2
error in ‘model27e430f5182_2bd322cf0524e4cdf85261f7275688d3’ at line 22, column 8

20: transformed data{
21: matrix[N,T] log_A;
22: log_A= log(A);
           ^

Just show us the R script you’re using. It’s hard to tell what mistakes you’re making when we can’t see your code.

data{
int T;
int N;
vector[N] score;
matrix[N, T] A;
int trial[N];
int<lower=0> y1[N,T];
int<lower=0> y2[N,T];
}
transformed data{
matrix[N,T] log_A;
log_A = log(A);
}
parameters{
matrix[1,2] beta;
vector[T] d;
matrix[2,8] ran_trial;
vector[2] s_trial;
}
transformed parameters{
vector[T] log_d;
log_d= log(d);
}
model {
for (o in 1:2){
beta[1,o] ~ normal(0,1);
s_trial[o] ~ student_t(1,0,1);
for(tn in 1:8){
ran_trial[o, tn] ~ normal(0,s_trial[o]);
}
}}
d ~ exponential(1);
for(j in 1:T) {
   for (i in 1:N) {
y1[i,j] ~ poisson_log(log_A[i,j] +beta[1,1]* score[i] +ran_trial[1,trial[i]]+log_d);
y2[i,j] ~ poisson_log(log_A[i,j] +beta[1,2]* score[i] +ran_trial[2,trial[i]]+log_d);
}}
} 
sm = stan_model(model_code=modelstring)
stan_data = list(score=mydata$score,trial=as.numeric(mydata$trial),
                 y1=y1,y2=y2,A=A, N=N, T=T)
fit= sampling(sm, data=stan_data, chains=3,iter=600,warmup=200,thin=2)

Dimension mismatch in assignment; variable name = log_A, num dimensions given = 0; right-hand side dimensions = 2
error in ‘model27e430f5182_2bd322cf0524e4cdf85261f7275688d3’ at line 22, column 8
20: transformed data{
21: matrix[N,T] log_A;
22: log_A= log(A);
^

What do you get when you run the following lines of R code?:

print(dim(y1))
print(dim(y2))
print(dim(A))
print(N)
print(T)

Also, be careful when using T as a variable in R. By default, R sets T to TRUE, so if you forget to set the value of T yourself, weird things can happen.

print(N)
[1] 4400
print(T)
[1] 30
print(dim(y1))
[1] 4400 30
print(dim(y2))
[1] 4400 30
print(dim(A))
[1] 4400 30
print(N)
[1] 4400
print(T)
[1] 30

I’m not sure what the problem is, but I do notice a lot of missing semicolons in the Stan code that you posted. Indeed, I can’t even get it to compile, which makes me wonder if the code you’ve posted is the code that’s causing the error messages that you’re showing.

Yeah, about the semicolons, because I can not share my code. This is the fake code that I created but similar to mine. I have all the semicolons in my real code, so it is not the problem. I just fixed it in my post. How about now?