There are a lot of issues in the code so let’s tackle them one by one.
these three parameters are in the parameter block, so they are being sampled, but you do not use them anywhere and they don’t influece the log likelihood. This will cause substantial problems as there is nothing for the sampler to explore - the likelihood is constant with respect to them.
The alpha parameter is also not doing anything - you are sampling from its prior, but it is not connected to the model of the response in any way.
Next, there’s a bunch of variables in the data block that you don’t use, so let’s get rid of them to make the code easier to understand and debug.
There are also missing parantheses around the for loop for the prior on beta.
The int y[t0, n_t]
syntax is no longer supported in the latest version of stan, so we need to use array syntax.
So now we are down to:
data {
int<lower=1> n_c; // Numero di predittori per ogni unità
int<lower=1> n_t; // Numero di predittori per ogni unità
int<lower=0> t0; // Numero di osservazioni per ogni unità, tempi pre-trattamento
array[t0, n_t] int y; // Matrice delle osservazioni di conteggio
matrix[t0, n_c] x; // Matrice dei predittori, tempi pre-trattamento
}
parameters {
matrix[n_c, n_t] beta; // Matrice dei coefficienti di regressione
}
model {
// Priori
for(ii in 1:n_c) {
beta[ii,] ~ normal(0, 1); // Priori normali per i coefficienti
}
for (i in 1:t0) {
y[i] ~ poisson_log(x[i] * beta);
}
}
This now runs fine for me with a simulated dataset:
library(cmdstanr)
mod <- cmdstan_model(write_stan_file(stancode)) # the code above is saved in a stancode variable
# simulate data
N = 1000
data = list(n_c = 3, n_t = 2, t0 = N)
data$x <- matrix(rnorm(N*3), N, 3)
bs <- matrix(c(0.3, 0.6, 0.9,-0.3,-0.6,-0.9), nrow=3)
data$y <- matrix(rpois(N*2, exp(data$x %*% bs)), N, 2)
# estimate the model
fit <- mod$sample(data, parallel_chains = 4)
fit
variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
lp__ 968.05 968.39 1.74 1.60 964.73 970.27 1.00 1827 2355
beta[1,1] 0.26 0.26 0.02 0.02 0.22 0.30 1.00 5630 3211
beta[2,1] 0.60 0.60 0.02 0.02 0.56 0.63 1.00 5270 3145
beta[3,1] 0.89 0.89 0.02 0.02 0.86 0.92 1.00 5656 2965
beta[1,2] -0.24 -0.25 0.02 0.02 -0.28 -0.21 1.00 5197 3217
beta[2,2] -0.62 -0.62 0.02 0.02 -0.66 -0.59 1.00 4817 2987
beta[3,2] -0.91 -0.91 0.02 0.02 -0.94 -0.88 1.00 4711 3203
The code can be optimized further. You don’t have to loop over the observations, it would be much easiler to loop over the response variables, and that way you can combine the stamtement for both beta’s prior and y in a single much shorter loop. We can also use the built in function poisson_log_glm() which is much more efficient for linear formula syntax:
data {
int<lower=1> n_c; // number of predictors
int<lower=1> n_t; // number of dependent variables
int<lower=0> t0; // number of observations
array[t0, n_t] int y; // outcome
matrix[t0, n_c] x; // predictors
}
parameters {
matrix[n_c, n_t] beta; // regression coefficients
}
model {
for (i in 1:n_t) {
beta[,i] ~ normal(0,1);
y[,i] ~ poisson_log_glm(x, 0, beta[,i]);
}
}
this code runs in 0.5 seconds, while the code above runs in ~4 seconds. The results are the same:
variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
lp__ 968.08 968.43 1.77 1.65 964.70 970.30 1.00 1821 2648
beta[1,1] 0.26 0.26 0.02 0.02 0.22 0.30 1.00 5554 3246
beta[2,1] 0.60 0.60 0.02 0.02 0.56 0.63 1.00 4856 3550
beta[3,1] 0.89 0.89 0.02 0.02 0.86 0.92 1.00 5527 3194
beta[1,2] -0.24 -0.24 0.02 0.02 -0.28 -0.21 1.00 4807 2968
beta[2,2] -0.62 -0.62 0.02 0.02 -0.65 -0.58 1.00 5572 2894
beta[3,2] -0.91 -0.91 0.02 0.02 -0.94 -0.88 1.00 5611 3100
Hope this helps. In general, try to start simple. Don’t build complexity into your model until you can run a simple version and get what you expect. Start adding parameters and conditions only after you ensure that a model works well in its simple version.