Hello, community!
I’m working with categorical_logit estimations over T times, which means that for each time t \in T, I have a matrix \eta_t where each column is the sample linear predictor associated with the s-th category, s \in S.
As time goes by, I have fewer and fewer individuals in my t samples, which leads me to have different dimensions of \eta_t.
Since STAN doesn’t accept arrays of vectors/matrices with different sizes, the solution I found was to padd values in those objects until they have the same dimension and control the subjects used in the estimation with a sample index.
I was able to create the program and estimate the parameters, but it is taking too long to run and I’d like to improve it via the reduce_sum approach.
Data have the right dimension (I checked that via print
) and the model runs fine with the syntax below:
data {
int<lower = 1> T; // Observed times
int<lower = 1> S; // Number of categories
int<lower = 1> K; // Covariates dimension (including intercept)
int<lower = 1> G_N; // Generic sample size for all t
array[T] int<lower = 0> V_N; // Valid sample size for each t
array[G_N, T] int<lower = -9999, upper = S> Y; // Response variables (as arrays)
array[T] matrix[G_N, K] X; // Design matrices (as matrices)
}
transformed data {
vector[K] zeros = rep_vector(0, K);
}
parameters {
array[T] matrix[K, S-1] beta_raw; // Coefficients for each category S (except the reference) for all t
array[K] cholesky_factor_corr[S-1] Lcorr; // Cholesky factor (L_u matrix for R) for all t
array[K] vector<lower=0>[S-1] sigma; // Standard deviations for each element on beta_raw for all t
}
transformed parameters {
array[T] matrix[K, S] beta; // Coefficients for each category S for all t
for (t in 1:T) {
beta[t] = append_col(zeros, beta_raw[t]); // Appending reference category appended to beta_raw (all betas = 0)
}
}
model{
array[T] matrix[G_N, S] eta; // Linear predictor for all categories and all t
for (t in 1:T) {
eta[t] = X[t] * beta[t]; // Compute linear predictor for each time point t
for (k in 1:K) {
target += cauchy_lpdf(sigma[k] | 0, 5); // Cauchy prior
target += lkj_corr_cholesky_lpdf(Lcorr[k] | 1); // LKJ prior
if (t == 1) {
target += multi_normal_cholesky_lpdf(to_vector(beta_raw[t, k]) | rep_vector(0, S-1), diag_pre_multiply(sigma[k], Lcorr[k])); // Likelihood for t=1
} else {
target += multi_normal_cholesky_lpdf(to_vector(beta_raw[t, k]) | beta_raw[t-1, k], diag_pre_multiply(sigma[k], Lcorr[k])); // Likelihood for t>1
}
}
for (n in 1:V_N[t]) {
target += categorical_logit_lpmf(Y[n, t] | eta[t, n]'); // Log probability contribution for categorical_logit
}
}
}
When I tried my approach via reduce_sum, I created this function which receives the dependent variable Y, the linear predictors \eta and the “true” sample size V_N:
functions{
real partial_sum_lpmf(array[] int Y_slice,
int start, int end,
matrix[, ] eta,
int V_N){
real lp = 0;
for (n in 1:V_N) {
lp += categorical_logit_lupmf(Y_slice[n] | eta[n]'); // Log probability contribution for categorical_logit
}
return lp;
}
}
and used it in the last line of the model block as below:
target += reduce_sum(partial_sum_lpmf, Y[, t], 1, eta[t], V_N[t]);
In my mind, I’m setting the data parameters right: “Y[, t]” is the response variable for time t (t-th column of 2-D array Y), "eta[t]’ is the t-th shelf of the 3-D array “eta”, and “V_N[t]” is the t-th element of the 1-D array “V_N”.
But, when I ran the model, I got this error and what remained for me was the “grainsize” as the main suspect for the cause of the following error:
I think some conflict is going on between my “V_N[t]”, the “start” and “end” arguments of my function, and the way “reduce_sum” iterate through data. But, I can’t understand how to fix that, as we don’t pass “start” and “end” explicitly when we call the function, it seems to be used indirectly via “grainsize”.
Does anyone know how to fix that and can help me? I’m almost sure that the problem lies there (also the error indicates problems with indexes, which corroborates my hypothesis).
Thanks in advance!