Multiplicative Constraint in Stan

Hi Stan Community,

I am trying to implement a deterministic component of my model in Stan, but am running into difficulties. Let me give some context. I have some sort of intervention that made a significant change in sales. The next question that gets asked is where the change in sales has come from. We usually decompose this into three components:
\text{Sales} = \text{Transactions} \times \text{Units per Basket} \times \text{Avg. Unit Retail}

Written another way, this is equivalent to:
\text{Sales} = \text{Transactions} \times \frac{\text{Units}}{\text{Tranasctions}} \times \frac{\text{Sales}}{\text{Units}}

Because I am interested in effect coefficients for all 4 measures, I create 4 separate parameter vectors. In the Stan code, I tried impose a normal likelihood with a sigma defined in the data block, where the mean is the product of the 3 components that make up sales. This acts as a sort of soft constraint. Due to the high correlation between the covariates I used the QR decomposition as well (intercepts are included in X matrices). Here is the Stan code thus far:

data {
  int N;
  int p_retail;
  int p_basket;
  int p_trans;
  int p_sales;
  
  matrix[N, p_retail] X_retail;
  matrix[N, p_basket] X_basket;
  matrix[N, p_trans] X_trans;
  matrix[N, p_sales] X_sales;
  
  vector[N] y_retail;
  vector[N] y_basket;
  vector[N] y_trans;
  vector[N] y_sales;
  real<lower=0> sig_constraint;
}

transformed data{
  // Retail
  matrix[N, p_retail] Q_retail;
  matrix[p_retail, p_retail] R_retail;
  matrix[p_retail, p_retail] R_retail_inv;
  
  Q_retail = qr_thin_Q(X_retail) * sqrt(N - 1);
  R_retail = qr_thin_R(X_retail) / sqrt(N - 1);
  R_retail_inv = inverse(R_retail);

  // Basket Size
  matrix[N, p_basket] Q_basket;
  matrix[p_basket, p_basket] R_basket;
  matrix[p_basket, p_basket] R_basket_inv;
  
  Q_basket = qr_thin_Q(X_basket) * sqrt(N - 1);
  R_basket = qr_thin_R(X_basket) / sqrt(N - 1);
  R_basket_inv = inverse(R_basket);
  
  // Transactions
  matrix[N, p_trans] Q_trans;
  matrix[p_trans, p_trans] R_trans;
  matrix[p_trans, p_trans] R_trans_inv;
  
  Q_trans = qr_thin_Q(X_trans) * sqrt(N - 1);
  R_trans = qr_thin_R(X_trans) / sqrt(N - 1);
  R_trans_inv = inverse(R_trans);
  
  // Sales
  matrix[N, p_sales] Q_sales;
  matrix[p_sales, p_sales] R_sales;
  matrix[p_sales, p_sales] R_sales_inv;
  
  Q_sales = qr_thin_Q(X_sales) * sqrt(N - 1);
  R_sales = qr_thin_R(X_sales) / sqrt(N - 1);
  R_sales_inv = inverse(R_sales);
}

parameters {
  vector[p_retail] beta_retail_qr;
  vector[p_basket] beta_basket_qr;
  vector[p_trans] beta_trans_qr;
  vector[p_sales] beta_sales_qr;
  
  vector<lower=0>[4] comp_sig;
}

model{
  beta_retail_qr ~ normal(0,20);
  beta_basket_qr ~ normal(0,20);
  beta_trans_qr ~ normal(0,20);
  beta_sales_qr ~ normal(0,20);
  comp_sig ~ exponential(1);
  
  vector[N] mu_retail = Q_retail * beta_retail_qr;
  vector[N] mu_basket = Q_basket * beta_basket_qr;
  vector[N] mu_trans = Q_trans * beta_trans_qr;
  vector[N] mu_sales = Q_sales * beta_sales_qr;
  
  
  y_retail ~ normal(mu_retail, comp_sig[1]);
  y_basket ~ normal(mu_basket, comp_sig[2]);
  y_trans ~ normal(mu_trans, comp_sig[3]);
  y_sales ~ normal(mu_sales, comp_sig[4]);
  
  mu_sales ~ normal(mu_retail .* mu_basket .* mu_trans, sig_constraint);
}

generated quantities{
  vector[p_retail] beta_retail = R_retail_inv * beta_retail_qr;
  vector[p_basket] beta_basket = R_basket_inv * beta_basket_qr;
  vector[p_trans] beta_trans = R_trans_inv * beta_trans_qr;
  vector[p_sales] beta_sales = R_sales_inv * beta_sales_qr;
}

When I run the Stan code, I get slow sampling and multiple warnings that the maximum treedepth was reached. I would expect this as the parameter space becomes highly constrained when the normal likelihood is placed on \mu_{\text{Sales}}.

My questions is whether there is a better way to parameterize this model that would allow the parameters to live in a more unconstrainted space, which I could then transform back to the original space in the generated quantities block.

Another issue that might be causing inefficient sampling is the priors on the QR coefficients. While I thought normal(0,20) would be wide enough, I notice that some QR coefficients exceed 500. I would like to be able to scale and center the columns of X before the QR decomp, and components y, but I am honestly unsure how that would effect the multiplicative constraint, since scaling and centering the individual components by their respective mean and std would break the multiplicative relationship.

Any help is greatly appreciated! Thanks!

Constrained in the sense that some values of parameters that satisfy their declared constraints are illegal? That’s bad for Stan’s sampling—it can devolve to very inefficient rejection sampling.

You definitely don’t want the prior fighting with the posterior. If the posterior’s 500 and the prior has a scale of 20, you know it’s misspecified and it’s going to mess with the posterior geometry.

Isn’t that what the QR decomposition is doing for you?

This definitely looks problematic in that all of these vectors are of size N, so it’s just introducing pure non-identifiability in the sense that when mu_retail goes up, the others can go down to exactly compensate. There’s simply no way to identify these parameters without more information coming from somewhere.