Test: Soft vs Hard sum-to-zero constrain + choosing the right prior for soft constrain

Are you referring to this post?

That was in the context of QR decomposition, so I’m not exactly sure how you are using it without using the QR decomposition.

I’m not sure I totally follow what is going on here, It sounds like you are actually exploring different models, rather than different parameterizations of the same model, and so this would be why you are getting different results, rather than landing on different posterior modes of the same model.

I’m essentially using the parameterization written by Bob in this post:

Which directly followed the one you’re referring to. I somehow missed this post:

Where you do cite a paper for the method (so I could have answered my first question on my own, I guess…)

Anyway, yes, I realize that your QR parameterization and Bob’s parameterization are both different models when compared to the model originally written in the manual, which suggested placing the prior only on beta_raw.

I’m sorry I haven’t been clear; I didn’t mean to hijack this thread.

thanks for this. the links here are broken, I checked the main reference https://www.cambridge.org/core/services/aop-cambridge-core/content/view/859F60F5AD4B81B03F65FB083E646BC5/S0008414X00031114a.pdf/normal-samples-with-linear-constraints-and-given-variances.pdf and it’s still really unclear, i know this thread is very old and this is asking for a lot, but if there’s an off chance you remember something about what you were trying to do in the code here, that would be really helpful!

Is it normal that the implementations here

and in the user guide

Stan User’s Guide

produce different results? Are they both correct but proposing different mapping? Am I overlooking anything?


 // https://discourse.mc-stan.org/t/test-soft-vs-hard-sum-to-zero-constrain-choosing-the-right-prior-for-soft-constrain/3884/31?u=stemangiola
 vector Q_sum_to_zero_QR(int N) {
    vector [2*N] Q_r;

    for(i in 1:N) {
      Q_r[i] = -sqrt((N-i)/(N-i+1.0));
      Q_r[i+N] = inv_sqrt((N-i) * (N-i+1));
    return Q_r;

  vector sum_to_zero_QR_discourse(vector x_raw, vector Q_r) {
    int N = num_elements(x_raw) + 1;
    vector [N] x;
    real x_aux = 0;

    for(i in 1:N-1){
      x[i] = x_aux + x_raw[i] * Q_r[i];
      x_aux = x_aux + x_raw[i] * Q_r[i+N];
    x[N] = x_aux;
    return x;

 // https://mc-stan.org/docs/2_18/stan-users-guide/parameterizing-centered-vectors.html
  matrix get_A_qr(int K){
  	matrix[K, K] A = diag_matrix(rep_vector(1,K));
		for (i in 1:K-1) A[K,i] = -1;
  	A[K,K] = 0;
  	return qr_Q(A)[ , 1:(K-1)];
  vector sum_to_zero_QR_guide(vector x_raw, matrix A_qr) {
    return A_qr * x_raw;


transformed data{
	int K = 10;
	vector[K-1] x_raw = [-1.89557863 , 1.00724850, -0.02504404, -1.81977322,  0.29557215,  1.30002674,  0.54844423,  1.60936596, -0.32222849]'; // runif(n = 9, min = -2, max = 2)
	vector[K*2] Q_r_1 = Q_sum_to_zero_QR(K);
	matrix[K, K-1] A_qr = get_A_qr(K);
	print("1 ----", sum_to_zero_QR_discourse(x_raw, Q_r_1));
	print("2 ----", sum_to_zero_QR_guide(x_raw, A_qr));
mod$sample(init = 1, fixed_param = TRUE)
Running MCMC with 4 sequential chains...

Chain 1 1 ----[1.7983,-1.14945,-0.0576796,1.60033,-0.635069,-1.47407,-0.495558,-1.17631,1.0226,0.566902] 
Chain 1 2 ----[-1.82099,0.753007,-0.0983261,-2.11121,-0.159768,0.974602,0.357323,1.55129,-0.305693,0.859761] 

Could someone please explain or provide an accessible reference as to why inv_sqrt(1 - inv(N)) is ‘right’ scale value to use in this situation on the free parameters to give a marginal standard normal distribution on the quantities of interest?

It is because the simplifed code was meant to replace parts of the code mentioned at:

Where the scaling was used. The link in the post is dead, but @mjhajharia provided an updated reference outlining the general approach: Fraser 1951, Normal samples with linear constraints and given variances

For the problem at hand, using Fraser’s notation, we have n-dimensional normal (x_1, \ldots, x_n) with k constraints, expressed as:

\sum_{j=1}^n a_{p,j}x_j =0, (p = n - k +1, \ldots, n)

In our case k = 1 and a_{n,1} = a_{n, 2} = \ldots = a_{n,n} = 1

Then you can follow Fraser’s method to find an n-1 dimensional covariance matrix T = ||\tau_{r,s}||, (r,s = 1, \ldots {n - 1}) and orthogonal matrix B = ||b_{i,j}||, (i,j = 1, \ldots, n).

Then if y_1, \ldots, y_{n - 1} \sim MVN(0, T) we have x_i = \sum_{r = 1}^{n - 1} b_{r,i} y_r satisfying the constraint and with the correct marginal distribution.

I didn’t do the math to double check, but it appears that the code a) uses a QR decomposition of B to get a more efficient computation for the transformation from y to x and b) that T is a diagonal matrix, with \tau_{i,i} = \frac{1}{\sqrt{1 - \frac{1}{n}}} for all i.

Does that clarify?