I am trying to fit an hierarchical regression, however some of the groups have no observations for certain IVs; so it translates to a ragged data set. I have followed the segmentation tutorial to apply segment to my X matrix, however have been having trouble applying the same to my parameters. I need to apply the segment/index logic to the following dot product below. (For context, certain local media tactics are not active in all of the dmas, they are only active in 7-10 dmas out of total 208)
...{ dot_product(cum_effects_hill[nn], beta[dma[nn]]) +
dot_product(X_ctrl[nn], gamma[dma[nn]])}...
model {
transformed parameters {
// a vector of the mean response
real mu[N];
// the cumulative media effect after adstock
real cum_effect;
// the cumulative media effect after adstock, and then Hill transformation
row_vector[num_media] cum_effects_hill[N];
row_vector[max_lag] lag_weights;
vector[num_ctrl] gamma[J];
vector[num_media] beta[J];
vector[J] theta;
for (nn in 1:N) {
for (media in 1 : num_media) {
for (lag in 1 : max_lag) {
lag_weights[lag] = pow(retain_rate[media], (lag - 1 - delay[media]) ^ 2);
}
cum_effect = Adstock(X_media[nn, media], lag_weights);
cum_effects_hill[nn, media] = Hill(cum_effect, ec[media], slope[media]);
}
mu[nn] = theta[dma[nn]] +
dot_product(cum_effects_hill[nn], beta[dma[nn]]) +
dot_product(X_ctrl[nn], gamma[dma[nn]]);
}
}
}
I am sorry but I don’t really understand the question - is the code you show BEFORE you tried to move to ragged data structures or AFTER or something in between? I also don’t know of any “segmentation tutorial” for Stan (and quick Googling didn’t help) - maybe providing a link would help us understand the context better. Finally showing the data
declarations might also be helpful to grasp what is happening.
Hope we can figure out a good way forward quickly :-)
Thank you for replying.
I was referring to the following
I have found a way around for a basic hierarchical regression, I just convert X_media and beta matrix to vectors, then continue with the dot product based on dma and variable/coefficient index. What I still have to figure out is how to apply the loop shown above to a ragged dataset
data {
// the total number of observations/rows
int<lower=1> K;
// rows with media data sizes
int s[K];
// rows with control data sizes
int p[K];
// the total length of media_vector; NAs removed
int<lower=1> V;
// the total length of control_vector; NAs removed
int<lower=1> C;
//the number of DMAs
int<lower=1> J;
//vector of media_dma indeces, for non_segment indexing
int<lower=1,upper=J> dma[K];
//vector of media_dma indeces
int<lower=1,upper=J> dmamedia[V];
//vector of control_dma indeces
int<lower=1,upper=J> dmacontrol[C];
// the vector of sales
real Y[K];
// the number of media channels
int<lower=1> num_media;
// the beta index
int<lower=1,upper=num_media> kk[V];
// the vector of media
row_vector[V] X_media;
// the number of other control variables
int<lower=1> num_ctrl;
// the controls index
int<lower=1,upper=num_media> jj[C];
// a vector of control variables
real<lower=0> X_ctrl[C];
}
parameters {
// residual variance
real<lower=0> noise_var;
// the population level intercept
real tau;
// the population coefficients for media variables
vector<lower=0>[num_media] beta_medias;
// popultaion level coffecients for other control variables
vector[num_ctrl] gamma_ctrl;
// group level coffecients for intercept
vector[J] theta;
// the group coefficients for media variables
vector[num_media] beta[J];
// the group coffecients for control
vector[num_ctrl] gamma[J];
// the standard deviation of intercept
real<lower=0>sigma_tau;
// the standard deviation of betas
real<lower=0>sigma_beta;
// sd of controll
real<lower=0>sigma_gamma;
}
transformed parameters {
row_vector[V] beta_vector;
row_vector[C] gamma_vector;
for(i in 1:V){
beta_vector[i] = beta[dmamedia[i],kk[i]];
}
for(i in 1:C){
gamma_vector[i] = gamma[dmacontol[i],jj[i]];
}
}
model{
int pos;
int pos_c;
vector[K] mu;
tau ~ normal(0,1);
sigma_tau ~ cauchy(0,2.5);
sigma_beta ~ cauchy(0,2.5);
sigma_gamma ~ cauchy(0,2.5);
for (j in 1:J){
theta[j] ~ normal(tau,sigma_tau);
}
beta_medias ~ normal(0,1);
for (j in 1:J){
beta[j] ~ normal(beta_medias,sigma_beta);
}
gamma_ctrl ~ normal(0,1);
for (j in 1:J) {
gamma[j] ~ normal(gamma_ctrl,sigma_gamma);
}
noise_var ~ normal(0, 0.05 * 0.01);
pos = 1;
pos_c = 1;
for(k in 1:K){
Y ~ normal(theta[dma[k]] + dot_product(segment(X_media, pos, s[k] ),segment(beta_vector, pos, s[k])),sqrt(noise_var) + dot_product(segment(X_ctrl, pos_c, p[k] ),segment(gamma_vector, pos_c, p[k])),sqrt(noise_var)); //compute the linear predictor using relevant group-level regression coefficients
pos = pos + s[k];
pos_c = pos_c + p[k];
}
}
Sorry for taking long to get back to you. Do I understand correctly that what you need is to have a beta
(and gamma
) coefficient for each value of dmamedia
. I.e. dmamedia
act as predictors. For any single Y
there is a single dma
and each dma
corresponds to a set of dmamedia
that are active?
If that is so, one way would be to just preprocess the data in R (or wherever you call your model from) to have a direct matrix[V,K] dmamedia_expanded
with 0-1 entries when the media is present / absent and then use it directly? (real value = dot_product(dmamedia_expanded[,k], beta[dma[k]];
(you could also build this in Stan, but that is IMHO more annoying).
Does that make sense? Or I am misunderstanding you?