How to set regression parameters to have norm 1?

Fro example, I’m fitting a regression model as
Y = beta_!x1 + beta_2x2 + beta_3*x3
and I have the beta vector as Beta = [beta_1, beta_2, beta_3].
How can I set constrient on Beta in stan that it has norm 1 as beta_1^2 + beta_2^2 + beta_3^2 = 1?

1 Like

My knee jerk reaction to this is to use a variable of type unit_vector. For example, in the parameters block:

unit_vector[3] Beta;

(see section 5.5 in the Stan reference manual) Then subset this variable as appropriate in your model block statements.

Caveats: I recall it is tricky to give this variable anything other than a uniform prior. And in some contexts, sampling such variables can be intrinsically difficult. I hope others chime in with further guidance.

1 Like

This forum has a monster thread about this precise topic here: A better unit vector

2 Likes

Perhaps, you can estimate two parameters, say \delta_1 \in (-\infty,\infty) and \delta_2 \in (-\infty,\infty), and then calculate transformed parameters \beta_1 = \sqrt{\exp(\delta_1)/(\exp(\delta_1)+\exp(\delta_2)+1)}, \beta_2 = \sqrt{\exp(\delta_2)/(\exp(\delta_1)+\exp(\delta_2)+1)}, and \beta_3 = \sqrt{1/(\exp(\delta_1)+\exp(\delta_2)+1)} so that \beta_1^2+\beta_2^2+\beta_3^2 = 1 is always fulfilled, no matter which values \delta_1 and \delta_2 have. Well, finding good priors for \delta_1 and \delta_2 might be difficult but perhaps you can try out different distributions, simulate for each distribution, say, 100,000 values for each of \delta_1 and \delta_2, calculate the corresponding values of \beta_1, \beta_2, and \beta_3, create histograms with the calculated values of \beta_1, \beta_2, and \beta_3, and check how meaningful these indirect priors for \beta_1, \beta_2, and \beta_3 are.

3 Likes

I’ll add a couple notes:

  1. unit_vector could work, BUT this constraint can induce multimodal posteriors. For example if without any constraint, the data happen to inform beta[1] and beta[2] well, but not provide a lot of information about beta[3], adding the constraint will induce a bimodal posterior where beta[3] is close to either +a or -a where a == sqrt(1 - beta[1]^2 + beta[2]^2). More complex scenarios with up 2^K modes (where K is the number of constrained coefficients) can definitely be constructed. Your specific data may constrain the posterior enough for this to not matter, but even if this is the case, the constraint may still hamper initialization…

  2. If you can a priori constrain the sign of the betas, then the simplest solution would IMHO be something like:

data {
   vector[K] beta_signs; // -1 or 1 to enforce sign
}

parameters {
   //induces that 0 < beta_squared < 1 and that beta_squared sums to 1.
   simplex[K] beta_squared; 
}

transformed parameters {
  vector[K] beta = beta_signs .* sqrt(beta_squared); 
}

This will likely behave very similarly to @Arne_Henningsen 's solution (which also fixes signs for betas as positive)

  1. If you don’t know the signs a prior AND have to handle the induced multimodality AND your number of predictors is not too large, you can explicitly marginalize over the 2^K options for signs. See Latent Discrete Parameters and/or the rater paper for instructions how to do this (this has some extra challenges, but tends to work very well in practice)

Hope that helps at least a little bit!

1 Like

Thank you all for help! I’ll get back to you guys is I found a good solution!