Ragged Dataset; segment implementation on hierarchical betas parameter

I am trying to fit an hierarchical regression, however some of the groups have no observations for certain IVs; so it translates to a ragged data set. I have followed the segmentation tutorial to apply segment to my X matrix, however have been having trouble applying the same to my parameters. I need to apply the segment/index logic to the following dot product below. (For context, certain local media tactics are not active in all of the dmas, they are only active in 7-10 dmas out of total 208)

 ...{ dot_product(cum_effects_hill[nn], beta[dma[nn]]) +
  dot_product(X_ctrl[nn], gamma[dma[nn]])}...
model {
  transformed parameters {
  // a vector of the mean response
  real mu[N];
  // the cumulative media effect after adstock
  real cum_effect;
  // the cumulative media effect after adstock, and then Hill transformation
  row_vector[num_media] cum_effects_hill[N];
  row_vector[max_lag] lag_weights;
  vector[num_ctrl] gamma[J]; 
  vector[num_media] beta[J]; 
  vector[J] theta;

  for (nn in 1:N) {
    for (media in 1 : num_media) {
      for (lag in 1 : max_lag) {
        lag_weights[lag] = pow(retain_rate[media], (lag - 1 - delay[media]) ^ 2);
      }
      cum_effect = Adstock(X_media[nn, media], lag_weights);
      cum_effects_hill[nn, media] = Hill(cum_effect, ec[media], slope[media]);
    }
    mu[nn] = theta[dma[nn]] +
      dot_product(cum_effects_hill[nn], beta[dma[nn]]) +
      dot_product(X_ctrl[nn], gamma[dma[nn]]);
  }
}
} 

I am sorry but I don’t really understand the question - is the code you show BEFORE you tried to move to ragged data structures or AFTER or something in between? I also don’t know of any “segmentation tutorial” for Stan (and quick Googling didn’t help) - maybe providing a link would help us understand the context better. Finally showing the data declarations might also be helpful to grasp what is happening.

Hope we can figure out a good way forward quickly :-)

Thank you for replying.

I was referring to the following

I have found a way around for a basic hierarchical regression, I just convert X_media and beta matrix to vectors, then continue with the dot product based on dma and variable/coefficient index. What I still have to figure out is how to apply the loop shown above to a ragged dataset

data {
  // the total number of observations/rows
  int<lower=1> K;
   // rows with media data sizes
  int s[K];
  // rows with control data sizes
  int p[K];        
  // the total length of media_vector; NAs removed
  int<lower=1> V;
  // the total length of control_vector; NAs removed
  int<lower=1> C;
  //the number of DMAs
  int<lower=1> J;
    //vector of media_dma indeces, for non_segment indexing
  int<lower=1,upper=J> dma[K]; 
  //vector of media_dma indeces
  int<lower=1,upper=J> dmamedia[V]; 
  //vector of control_dma indeces
  int<lower=1,upper=J> dmacontrol[C]; 
  // the vector of sales
  real Y[K];
  // the number of media channels
  int<lower=1> num_media;
  // the beta index
  int<lower=1,upper=num_media> kk[V];
  // the vector of media
  row_vector[V] X_media;
  // the number of other control variables
  int<lower=1> num_ctrl;
  // the controls index
  int<lower=1,upper=num_media> jj[C];
  // a vector of control variables
  real<lower=0> X_ctrl[C];
}

parameters {
  // residual variance
  real<lower=0> noise_var;
  // the population level intercept 
  real tau;
  // the population coefficients for media variables
  vector<lower=0>[num_media] beta_medias;
  // popultaion level coffecients for other control variables
  vector[num_ctrl] gamma_ctrl;
  // group level coffecients for intercept
  vector[J] theta;
  // the group coefficients for media variables
  vector[num_media] beta[J]; 
  // the group coffecients for control
  vector[num_ctrl] gamma[J]; 
  // the standard deviation of intercept
  real<lower=0>sigma_tau;
  // the standard deviation of betas
  real<lower=0>sigma_beta;
  // sd of controll
  real<lower=0>sigma_gamma;
}

transformed parameters { 
row_vector[V] beta_vector;
row_vector[C] gamma_vector;

for(i in 1:V){
  beta_vector[i] = beta[dmamedia[i],kk[i]];
 }

for(i in 1:C){
  gamma_vector[i] = gamma[dmacontol[i],jj[i]];
 }


}

model{
  int pos;
  int pos_c;
  vector[K] mu;
  tau ~ normal(0,1);
  sigma_tau ~ cauchy(0,2.5);
  sigma_beta ~ cauchy(0,2.5);
  sigma_gamma ~ cauchy(0,2.5);
  for (j in 1:J){
    theta[j] ~ normal(tau,sigma_tau);
  }
  beta_medias ~ normal(0,1);
  for (j in 1:J){
    beta[j] ~ normal(beta_medias,sigma_beta);
  }
  gamma_ctrl ~ normal(0,1);
  for (j in 1:J) {
  gamma[j] ~ normal(gamma_ctrl,sigma_gamma);
  }
  noise_var ~ normal(0, 0.05 * 0.01);
  
  pos = 1;
  pos_c = 1; 

  for(k in 1:K){
    Y ~ normal(theta[dma[k]] + dot_product(segment(X_media, pos, s[k] ),segment(beta_vector, pos, s[k])),sqrt(noise_var) + dot_product(segment(X_ctrl, pos_c, p[k] ),segment(gamma_vector, pos_c, p[k])),sqrt(noise_var)); //compute the linear predictor using relevant group-level regression coefficients 
    pos = pos + s[k];
    pos_c = pos_c + p[k];
  }
}

Sorry for taking long to get back to you. Do I understand correctly that what you need is to have a beta (and gamma) coefficient for each value of dmamedia. I.e. dmamedia act as predictors. For any single Y there is a single dma and each dma corresponds to a set of dmamedia that are active?

If that is so, one way would be to just preprocess the data in R (or wherever you call your model from) to have a direct matrix[V,K] dmamedia_expanded with 0-1 entries when the media is present / absent and then use it directly? (real value = dot_product(dmamedia_expanded[,k], beta[dma[k]]; (you could also build this in Stan, but that is IMHO more annoying).

Does that make sense? Or I am misunderstanding you?