Reduce_sum with padded arrays

NicholasWE · February 21, 2024, 12:57am

Hello, community!

I’m working with categorical_logit estimations over T times, which means that for each time t \in T, I have a matrix \eta_t where each column is the sample linear predictor associated with the s-th category, s \in S.

As time goes by, I have fewer and fewer individuals in my t samples, which leads me to have different dimensions of \eta_t.

Since STAN doesn’t accept arrays of vectors/matrices with different sizes, the solution I found was to padd values in those objects until they have the same dimension and control the subjects used in the estimation with a sample index.

I was able to create the program and estimate the parameters, but it is taking too long to run and I’d like to improve it via the reduce_sum approach.

Data have the right dimension (I checked that via print) and the model runs fine with the syntax below:


data {

  int<lower = 1> T; // Observed times
  int<lower = 1> S; // Number of categories 
  int<lower = 1> K; // Covariates dimension (including intercept)
  int<lower = 1> G_N; // Generic sample size for all t
  
  array[T] int<lower = 0> V_N;  // Valid sample size for each t
  array[G_N, T] int<lower = -9999, upper = S> Y;  // Response variables (as arrays)
  array[T] matrix[G_N, K] X; // Design matrices (as matrices)
}

transformed data {

  vector[K] zeros = rep_vector(0, K);
  
}

parameters {

  array[T] matrix[K, S-1] beta_raw; // Coefficients for each category S (except the reference) for all t

  array[K] cholesky_factor_corr[S-1] Lcorr; // Cholesky factor (L_u matrix for R) for all t
  
  array[K] vector<lower=0>[S-1] sigma; // Standard deviations for each element on beta_raw for all t
 
}

transformed parameters {
   
  array[T] matrix[K, S] beta; // Coefficients for each category S for all t 
   
  for (t in 1:T) {
	
	beta[t] = append_col(zeros, beta_raw[t]); // Appending reference category appended to beta_raw (all betas = 0)
  
  }

} 

model{

  array[T] matrix[G_N, S] eta; // Linear predictor for all categories and all t

  for (t in 1:T) {
  
    eta[t] = X[t] * beta[t]; // Compute linear predictor for each time point t

    for (k in 1:K) {
	
      target += cauchy_lpdf(sigma[k] | 0, 5);  // Cauchy prior
	  
      target += lkj_corr_cholesky_lpdf(Lcorr[k] | 1); // LKJ prior
	  
      if (t == 1) {
	  
        target += multi_normal_cholesky_lpdf(to_vector(beta_raw[t, k]) | rep_vector(0, S-1), diag_pre_multiply(sigma[k], Lcorr[k])); // Likelihood for t=1
		
      } else {
	  
        target += multi_normal_cholesky_lpdf(to_vector(beta_raw[t, k]) | beta_raw[t-1, k], diag_pre_multiply(sigma[k], Lcorr[k])); // Likelihood for t>1
		
      }
	  
    }
    
    for (n in 1:V_N[t]) {
	    
      target += categorical_logit_lpmf(Y[n, t] | eta[t, n]'); // Log probability contribution for categorical_logit
	  
    }
	
  }
  
}

When I tried my approach via reduce_sum, I created this function which receives the dependent variable Y, the linear predictors \eta and the “true” sample size V_N:


functions{

  real partial_sum_lpmf(array[] int  Y_slice,
						int start, int end,
						matrix[, ] eta,
						int V_N){
				   
    real lp = 0;
	
	for (n in 1:V_N) {
	
      lp += categorical_logit_lupmf(Y_slice[n] | eta[n]'); // Log probability contribution for categorical_logit
	  
    }
	
	return lp;
	
  }

}

and used it in the last line of the model block as below:

target += reduce_sum(partial_sum_lpmf, Y[, t], 1, eta[t], V_N[t]);

In my mind, I’m setting the data parameters right: “Y[, t]” is the response variable for time t (t-th column of 2-D array Y), "eta[t]’ is the t-th shelf of the 3-D array “eta”, and “V_N[t]” is the t-th element of the 1-D array “V_N”.

But, when I ran the model, I got this error and what remained for me was the “grainsize” as the main suspect for the cause of the following error:

I think some conflict is going on between my “V_N[t]”, the “start” and “end” arguments of my function, and the way “reduce_sum” iterate through data. But, I can’t understand how to fix that, as we don’t pass “start” and “end” explicitly when we call the function, it seems to be used indirectly via “grainsize”.

Does anyone know how to fix that and can help me? I’m almost sure that the problem lies there (also the error indicates problems with indexes, which corroborates my hypothesis).

Thanks in advance!

Bob_Carpenter · February 21, 2024, 7:46pm

You might want to check out the Stan User’s Guide chapter on ragged arrays.

This is the second post on grain size today. Did you try running with a grain size of 1 and letting the system sort it out for you?

You can pass the start/end indexes as additional integer arguments. You’ll need those to deal with the ragged data in the likelihood.

mathDR · February 21, 2024, 8:01pm

In your loop within the partial_sum_lpmf function, you are taking the wrong index for eta (as Bob alludes to above).

One thing to do is construct your reduce_sum step by step. The first thing would be to vectorize the target += loop in your model. You may not even require reduce_sum at all.

One thing I notice immediately:

for (t in 1:T) {
   ...
    for (n in 1:V_N[t]) {    
      target += categorical_logit_lpmf(Y[n, t] | eta[t, n]'); // Log probability contribution for categorical_logit  
    }
  }

can be vectorized:

for (t in 1:T) {
   ...
    target += categorical_logit_lpmf(Y[1:V_N[t], t] | eta[t, 1:V_N[t]]');
  }

and most likely a lot of the k loop can be similarly vectorized.

Can I ask why you are doing:

for (t in 1:T) {
  
    ...

    for (k in 1:K) {
	
      target += cauchy_lpdf(sigma[k] | 0, 5);  // Cauchy prior

and not

sigma ~ cauchy(0,5);

outside of the t loop?

Same question for Lcorr?

Lcorr ~ lkj_corr_cholesky(1);

So maybe (I didn’t test this)

model{

  array[T] matrix[G_N, S] eta; // Linear predictor for all categories and all t
  sigma ~ cauchy(0,5);  
  Lcorr ~ lkj_corr_cholesky(1);
  
  for (t in 1:T) {
    eta[t] = X[t] * beta[t]; // Compute linear predictor for each time point t

    for (k in 1:K) {
      if (t == 1)  {
        target += multi_normal_cholesky_lpdf(to_vector(beta_raw[t, k]) | rep_vector(0, S-1), diag_pre_multiply(sigma[k], Lcorr[k])); // Likelihood for t=1
		
      } else {
	  
        target += multi_normal_cholesky_lpdf(to_vector(beta_raw[t, k]) | beta_raw[t-1, k], diag_pre_multiply(sigma[k], Lcorr[k])); // Likelihood for t>1
		
      }
	  
    }
    
   target += categorical_logit_lpmf(Y[1:V_N[t], t] | eta[t, 1:V_N[t]]');
	  
    }
	
  }

although I am still unsure of the k loop…

NicholasWE · February 21, 2024, 11:05pm

Thanks for the reference, Bob!
I read it and it seemed to me that it helps a lot in my problem when we talk about the Y and X data, but when it comes to \beta_t and \eta_t matrices I don’t see any way out of the approach of using arrays, since I can’t just concatenate than (or I can?).
Anyway, the use of index t continues to be necessary in my point of view (or in my limited understanding).

(UPDATE: I just realized that categorical_logit distribution is not vectorized for parameter arguments, so the loop in observations is required. Don’t know how much segment helps me in this case)

Yep, the 1 here is the grain size, but the problem is that the reduce_sum didn’t even work due to indexes.

Do you mean that instead of setting the grain size when I call the function I can simply set start and end arguments (since they are already described in the function)?

Thanks!

NicholasWE · February 21, 2024, 11:36pm

Thanks for the contribution :)

Yes, this is the point I may be missing. I can’t see why I’m using the wrong index for eta, since I’m setting the t-th shelf of the 3-D matrices array when I call the reduce_sum and this sounds right according to the manual:

mathDR:

One thing to do is construct your reduce_sum step by step. The first thing would be to vectorize the target += loop in your model. You may not even require reduce_sum at all.

One thing I notice immediately:
for (t in 1:T) {
   ...
    for (n in 1:V_N[t]) {    
      target += categorical_logit_lpmf(Y[n, t] | eta[t, n]'); // Log probability contribution for categorical_logit  
    }
  }
can be vectorized:
for (t in 1:T) {
   ...
    target += categorical_logit_lpmf(Y[1:V_N[t], t] | eta[t, 1:V_N[t]]');
  }

My first attempt was like that, but I’m afraid categorical_logit_lpmf doesn’t have the vectorized approach:

mathDR:

Can I ask why you are doing:
for (t in 1:T) {
  
    ...

    for (k in 1:K) {
	
      target += cauchy_lpdf(sigma[k] | 0, 5);  // Cauchy prior
and not
sigma ~ cauchy(0,5); 
outside of the t loop?

Same question for Lcorr?
Lcorr ~ lkj_corr_cholesky(1);

Good catch/question. The reason why I’m doing that is because I have k sigma vectors with length S-1, and I don’t know if by just placing sigma ~ cauchy(0,5); all k \times (S-1) single elements of this array will be associated to the prior.
I’m only guaranteeing this by looping in k, but I can be wrong.

Unfortunately, it didn’t work due to the categorical-logit distribution vectorization restriction :/

Thanks again for the contribution!

NicholasWE · February 22, 2024, 3:01pm

UPDATE: I updated the matrix argument in the function, it was wrong. But I still having the same error related to indexes:

OLD

NEW

  real partial_sum_lpmf(array[] int Y_slice,
						int start, int end,
						matrix eta,
						int V_N){

ERROR

Line 12 refers to the loop inside the function and line 95 to the call of the function in model block.

mathDR · February 23, 2024, 5:15pm

okay for the eta index, I think you should be using:

functions{

  real partial_sum_lpmf(array[] int  Y_slice,
						int start, int end,
						matrix[, ] eta,
						int V_N){
				   
    real lp = 0;
	
	for (n in 1:V_N) {
	
      lp += categorical_logit_lupmf(Y_slice[n] | eta[start+n]'); // Log probability contribution for categorical_logit
	  
    }
	
	return lp;
	
  }

}

since the “global” index for Y_slice is from start:end and if you want to match that up with eta, you will need to shift the index.

NicholasWE · February 23, 2024, 9:15pm

Thanks for the clarification it makes sense, I should set the same range for \eta and Y, but unfortunately, your suggestion didn’t work.

I found this discussion about Error message:"index: accessing element out of range” when using reduce_sum, which exposes the same approach you suggest. I’ll look deeper at that and think about Bob’s solution using ragged arrays.

I’ll come back if I have any updates!

Thanks one more time!

Topic		Replies	Views
Padding values in response arrays with different sizes Modeling rstan , performance , multinomial-response	5	791	August 2, 2021
Answers conditions inside model block Modeling rstan , multinomial-response	4	452	September 27, 2021
Constraining predictions of matrices to sums of categories of rows and columns Modeling	12	679	November 15, 2018
If I can spped up base my reduce sum stan code Modeling rstan	1	240	July 17, 2023
[HELP] can I rep the parameter vector to matrix Modeling rstan , specification	2	179	March 27, 2024

Reduce_sum with padded arrays

Related topics