Hi Stan Community,
I am trying to implement a deterministic component of my model in Stan, but am running into difficulties. Let me give some context. I have some sort of intervention that made a significant change in sales. The next question that gets asked is where the change in sales has come from. We usually decompose this into three components:
\text{Sales} = \text{Transactions} \times \text{Units per Basket} \times \text{Avg. Unit Retail}
Written another way, this is equivalent to:
\text{Sales} = \text{Transactions} \times \frac{\text{Units}}{\text{Tranasctions}} \times \frac{\text{Sales}}{\text{Units}}
Because I am interested in effect coefficients for all 4 measures, I create 4 separate parameter vectors. In the Stan code, I tried impose a normal likelihood with a sigma defined in the data
block, where the mean is the product of the 3 components that make up sales. This acts as a sort of soft constraint. Due to the high correlation between the covariates I used the QR decomposition as well (intercepts are included in X matrices). Here is the Stan code thus far:
data {
int N;
int p_retail;
int p_basket;
int p_trans;
int p_sales;
matrix[N, p_retail] X_retail;
matrix[N, p_basket] X_basket;
matrix[N, p_trans] X_trans;
matrix[N, p_sales] X_sales;
vector[N] y_retail;
vector[N] y_basket;
vector[N] y_trans;
vector[N] y_sales;
real<lower=0> sig_constraint;
}
transformed data{
// Retail
matrix[N, p_retail] Q_retail;
matrix[p_retail, p_retail] R_retail;
matrix[p_retail, p_retail] R_retail_inv;
Q_retail = qr_thin_Q(X_retail) * sqrt(N - 1);
R_retail = qr_thin_R(X_retail) / sqrt(N - 1);
R_retail_inv = inverse(R_retail);
// Basket Size
matrix[N, p_basket] Q_basket;
matrix[p_basket, p_basket] R_basket;
matrix[p_basket, p_basket] R_basket_inv;
Q_basket = qr_thin_Q(X_basket) * sqrt(N - 1);
R_basket = qr_thin_R(X_basket) / sqrt(N - 1);
R_basket_inv = inverse(R_basket);
// Transactions
matrix[N, p_trans] Q_trans;
matrix[p_trans, p_trans] R_trans;
matrix[p_trans, p_trans] R_trans_inv;
Q_trans = qr_thin_Q(X_trans) * sqrt(N - 1);
R_trans = qr_thin_R(X_trans) / sqrt(N - 1);
R_trans_inv = inverse(R_trans);
// Sales
matrix[N, p_sales] Q_sales;
matrix[p_sales, p_sales] R_sales;
matrix[p_sales, p_sales] R_sales_inv;
Q_sales = qr_thin_Q(X_sales) * sqrt(N - 1);
R_sales = qr_thin_R(X_sales) / sqrt(N - 1);
R_sales_inv = inverse(R_sales);
}
parameters {
vector[p_retail] beta_retail_qr;
vector[p_basket] beta_basket_qr;
vector[p_trans] beta_trans_qr;
vector[p_sales] beta_sales_qr;
vector<lower=0>[4] comp_sig;
}
model{
beta_retail_qr ~ normal(0,20);
beta_basket_qr ~ normal(0,20);
beta_trans_qr ~ normal(0,20);
beta_sales_qr ~ normal(0,20);
comp_sig ~ exponential(1);
vector[N] mu_retail = Q_retail * beta_retail_qr;
vector[N] mu_basket = Q_basket * beta_basket_qr;
vector[N] mu_trans = Q_trans * beta_trans_qr;
vector[N] mu_sales = Q_sales * beta_sales_qr;
y_retail ~ normal(mu_retail, comp_sig[1]);
y_basket ~ normal(mu_basket, comp_sig[2]);
y_trans ~ normal(mu_trans, comp_sig[3]);
y_sales ~ normal(mu_sales, comp_sig[4]);
mu_sales ~ normal(mu_retail .* mu_basket .* mu_trans, sig_constraint);
}
generated quantities{
vector[p_retail] beta_retail = R_retail_inv * beta_retail_qr;
vector[p_basket] beta_basket = R_basket_inv * beta_basket_qr;
vector[p_trans] beta_trans = R_trans_inv * beta_trans_qr;
vector[p_sales] beta_sales = R_sales_inv * beta_sales_qr;
}
When I run the Stan code, I get slow sampling and multiple warnings that the maximum treedepth was reached. I would expect this as the parameter space becomes highly constrained when the normal likelihood is placed on \mu_{\text{Sales}}.
My questions is whether there is a better way to parameterize this model that would allow the parameters to live in a more unconstrainted space, which I could then transform back to the original space in the generated quantities
block.
Another issue that might be causing inefficient sampling is the priors on the QR coefficients. While I thought normal(0,20)
would be wide enough, I notice that some QR coefficients exceed 500. I would like to be able to scale and center the columns of X before the QR decomp, and components y, but I am honestly unsure how that would effect the multiplicative constraint, since scaling and centering the individual components by their respective mean and std would break the multiplicative relationship.
Any help is greatly appreciated! Thanks!