How to constrain some correlations to 0?

Suppose you have a 3x3 correlation matrix of variables A, B and C. Let´s say that you are building a covariance matrix like this:

parameters {
corr_matrix[3] corr;
vector<lower=0>[3] var;
}

transformed_parameters{
cov_matrix[3] varcov;
varcov=quad_form_diag(corr, var);
}

model{
corr ~ lkj_corr(1);
var ~ cauchy(0,5)
}

So everything looks great up until now. However, what if we want to constrain the correlation between the variables B and C to be 0? I am trying to find out how to build a matrix like that but I don’t understand enough about the underlying mechanisms of matrices in Stan. Any help would be appreciated!

One way to achieve this is to define an intermediate variable that’s a copy of the correlations then zero the entries you want constrained:

parameters {
	corr_matrix[3] corr;
	vector<lower=0>[3] var;
}
transformed parameters{
	matrix[3] corr_constrained = corr;
	corr_constrained[2,3] = 0 ;
	corr_constrained[3,2] = 0 ;
	cov_matrix[3] varcov;
	varcov = quad_form_diag(corr_constrained, var);
}
model{
	corr ~ lkj_corr(1);
	var ~ cauchy(0,5)
}
3 Likes

I think this works great but in my case I’ve had to write

matrix[3,3]

instead of

matrix[3]

Ah, right, typo on my part, good catch!

1 Like

Mike, I’m not sure that this solution will always work. For example, I think the lkj could possibly give you a positive definite matrix with correlations of .9 everywhere. Then, if you change one of those correlations to zero, the matrix is no longer positive definite.

This probably does not matter for some applications, but it might matter if you are trying to define a prior distribution.

2 Likes

Good point. Really highlights the restricted utility of the multivariate normal as a structure for achieving inference on relationships. I’ve started doing more SEM stuff lately, which give much better control over things.

You can try this code Correlation matrix with positively constrained off-diagonals - #3 by spinkney. Don’t put the lower bound on y_raw. And only declare as many non-zero off diagonal elements that you need.

In the transformed parameters block you’d create a new y_raw; where you’d place the zeroes where you want (keeping track of which vector index corresponds to the matrix index in the correlation matrix). Then pass that to matrix[K, K] y = cholesky_corr_constrain_lp(y_raw_new, K);

4 Likes

Hello, thank you very much for your helpful piece of code and I apologize for coming back to this only after so much time. I would just like to ask a few questions to better understand the code.

When you say to only declare as many non-zero off diagonal elements as I need, to which part of the code does that refer to? Should I modify the indices of the for loops in the functions block so that the elements which should be zero are skipped, or does this refer to some other part of the code?

Next question, if I am creating the y_raw_new in the transformed parameters block, should I remove y_raw from the parameters block? What purpose does y_raw serve at that point?

Thank you very much for your help.

Furthermore, I’m not fully sure I understand the whole idea around this approach. I have been trying out some code that I am unsure of and I have successfuly constrained some elements of the Choleksy matrix to be 0. However, whether or not that constraint will also be translated to the correlation matrix depends on the other elements of the Cholesky matrix due to the rules of the matrix algebra.

Could you please just confirm that I have understood something wrong and that the method that you describe can also be used to constrain correlations in the correlation matrix to zero, and not just the Choleksy factors?

I don’t know why I wrote that. It’s a bit more difficult because the cholesky factor of a sparse correlation matrix is not necessarily sparse. You can set values of the correlation matrix to 0 and work from there. The issue being that that matrix may not be PD. You can get this to work but it’s all a bit more involved. There’s some rough code at this PR fixed coupla_defination.stan of zero_constrain by yamikarajput546 · Pull Request #69 · spinkney/helpful_stan_functions · GitHub.

I’ve been meaning to write up exactly how to do this but not sure when I’ll get time.

3 Likes

I have posted some code / ideas that will let you do this in a few different threads, the best overview probably starts here: Partial-pooling of correlation (or covariance) matrices? - #4 by BenH
To fix the correlation to zero you just need to ensure the relevant value of the vector you pass in to the constraincorsqrt function is zero.

2 Likes

Thank you very much, this looks really helpful! I would just like to ask when it comes to ensuring that the relevant elements of the vector passed to the function are zero, would it suffice to put e.g. rawcor[1] = 0; and rawcor[3] = 0; in the transformed parameters block?

yep, something like that.

1 Like

I would like to again follow up on this because I haven’t found a way to make it work. But this time, maybe it would be useful to start building it from the other way around.

Let’s say that we are working on a case of confirmatory factor analysis. For this, we can use the demo code written by @mike-lawrence.

This code assumes that there are no correlations between the residuals of the manifest variables. So in a way, all these correlations here are already constrained to zero.

Now let’s say that we want to release some correlations of residuals from this constraint. Let’s assume that we have 10 items in the test and we want to unconstrain the correlation between the 3. and 5. and 6. and 7. items. How can we do this, while keeping all other correlations zero?

I’ve had much more exposure to SEM since this thread was last alive and suggest that you might look at that as a way of implementing your model structure. For the case you originally presented with three scales, A, B, & C, where you want B & C modeled as independent but want inference on possible correlations between A~B and A~C, here’s how I’d do it:


functions {
	// Michael Betancourt's recommended prior for ordinal cutpoints:
	real induced_dirichlet_0_lpdf(vector c, vector alpha) {
		int K = num_elements(c) + 1;
		vector[K - 1] sigma = inv_logit( - c);
		vector[K] p;
		matrix[K, K] J = rep_matrix(0, K, K);

		// Induced ordinal probabilities
		p[1] = 1 - sigma[1];
		for (k in 2:(K - 1))
			p[k] = sigma[k - 1] - sigma[k];
		p[K] = sigma[K - 1];

		// Baseline column of Jacobian
		for (k in 1:K) J[k, 1] = 1;

		// Diagonal entries of Jacobian
		for (k in 2:K) {
			real rho = sigma[k - 1] * (1 - sigma[k - 1]);
			J[k, k] = - rho;
			J[k - 1, k] = rho;
		}

		return	 dirichlet_lpdf(p | alpha)
					 + log_determinant(J);
	}
	real induced_dirichlet_lpdf(vector c, vector alpha, real phi) {
		int K = num_elements(c) + 1;
		vector[K - 1] sigma = inv_logit(phi - c);
		vector[K] p;
		matrix[K, K] J = rep_matrix(0, K, K);

		// Induced ordinal probabilities
		p[1] = 1 - sigma[1];
		for (k in 2:(K - 1))
			p[k] = sigma[k - 1] - sigma[k];
		p[K] = sigma[K - 1];

		// Baseline column of Jacobian
		for (k in 1:K) J[k, 1] = 1;

		// Diagonal entries of Jacobian
		for (k in 2:K) {
			real rho = sigma[k - 1] * (1 - sigma[k - 1]);
			J[k, k] = - rho;
			J[k - 1, k] = rho;
		}

		return	 dirichlet_lpdf(p | alpha)
					 + log_determinant(J);
	}
}
data{
	int n_respondents ;

	int n_items_A ;
	int n_ordered_options_per_item_A ;
	matrix [n_resp,n_items_A] A ;

	int n_items_B ;
	int n_ordered_options_per_item_B ;
	matrix [n_resp,n_items_B] B ;

	int n_items_B ;
	int n_ordered_options_per_item_C ;
	matrix [n_resp,n_items_B] C ;

}
parameters{
	// SEM parameters
	vector[n_respondents] a_unique ;
	vector[n_respondents] b_latent ;
	vector[n_respondents] c_latent ;
	unit_vector[3] weights ;
	// response-level parameters
	array[n_items_A] ordered[n_ordered_options_per_item_A-1] cutpoints_A ;
	array[n_items_B] ordered[n_ordered_options_per_item_B-1] cutpoints_B ;
	array[n_items_C] ordered[n_ordered_options_per_item_C-1] cutpoints_C ;
}
model{
	// priors for SEM
	a_unique ~ std_normal() ;
	b_latent ~ std_normal() ;
	c_latent ~ std_normal() ;
	// derived value for a
	vector[n_respondents] a_latent = (
		weights[1]*a_unique
		+ weights[2]*b_latent
		+ weights[3]*c_latent
	)
	// response-level priors & likelihood
	for(i in 1:n_items_A){
		cutpoints_A[i] ~ induced_dirichlet_0(n_ordered_options_per_item_A);
		A[:,i] ~ ordered_logistic( a_latent	, cutpoints_A[i] );
	}
	for(i in 1:n_items_B){
		cutpoints_B[i] ~ induced_dirichlet_0(n_ordered_options_per_item_B);
		B[:,i] ~ ordered_logistic( b_latent	, cutpoints_B[i] );
	}
	for(i in 1:n_items_C){
		cutpoints_C[i] ~ induced_dirichlet_0(n_ordered_options_per_item_C);
		C[:,i] ~ ordered_logistic( c_latent	, cutpoints_C[i] );
	}
}

2 Likes

Oh, there’s a mild non-identifiability in the sign of the a_unique values as coded above with unit_vector weights. This version should be equivalent but restricts the weight on a_unique to positive, thereby eliminating the non-identifiability.

functions {
	// Michael Betancourt's recommended prior for ordinal cutpoints:
	real induced_dirichlet_0_lpdf(vector c, vector alpha) {
		int K = num_elements(c) + 1;
		vector[K - 1] sigma = inv_logit( - c);
		vector[K] p;
		matrix[K, K] J = rep_matrix(0, K, K);

		// Induced ordinal probabilities
		p[1] = 1 - sigma[1];
		for (k in 2:(K - 1))
			p[k] = sigma[k - 1] - sigma[k];
		p[K] = sigma[K - 1];

		// Baseline column of Jacobian
		for (k in 1:K) J[k, 1] = 1;

		// Diagonal entries of Jacobian
		for (k in 2:K) {
			real rho = sigma[k - 1] * (1 - sigma[k - 1]);
			J[k, k] = - rho;
			J[k - 1, k] = rho;
		}

		return	 dirichlet_lpdf(p | alpha)
					 + log_determinant(J);
	}
	real induced_dirichlet_lpdf(vector c, vector alpha, real phi) {
		int K = num_elements(c) + 1;
		vector[K - 1] sigma = inv_logit(phi - c);
		vector[K] p;
		matrix[K, K] J = rep_matrix(0, K, K);

		// Induced ordinal probabilities
		p[1] = 1 - sigma[1];
		for (k in 2:(K - 1))
			p[k] = sigma[k - 1] - sigma[k];
		p[K] = sigma[K - 1];

		// Baseline column of Jacobian
		for (k in 1:K) J[k, 1] = 1;

		// Diagonal entries of Jacobian
		for (k in 2:K) {
			real rho = sigma[k - 1] * (1 - sigma[k - 1]);
			J[k, k] = - rho;
			J[k - 1, k] = rho;
		}

		return	 dirichlet_lpdf(p | alpha)
					 + log_determinant(J);
	}
}
data{
	int n_respondents ;

	int n_items_A ;
	int n_ordered_options_per_item_A ;
	matrix [n_resp,n_items_A] A ;

	int n_items_B ;
	int n_ordered_options_per_item_B ;
	matrix [n_resp,n_items_B] B ;

	int n_items_B ;
	int n_ordered_options_per_item_C ;
	matrix [n_resp,n_items_B] C ;

}
parameters{
	// SEM parameters
	vector[n_respondents] a_unique ;
	vector[n_respondents] b_latent ;
	vector[n_respondents] c_latent ;
	vector<lower=-1,upper=1>[2] weights_bc ;
	real<lower=0,upper=1> weight_a ;
	// response-level parameters
	array[n_items_A] ordered[n_ordered_options_per_item_A-1] cutpoints_A ;
	array[n_items_B] ordered[n_ordered_options_per_item_B-1] cutpoints_B ;
	array[n_items_C] ordered[n_ordered_options_per_item_C-1] cutpoints_C ;
}
transformed parameters{
	vector[3] weights ;
	weights[1] = weight_a ;
	weights[2:3] = (
		weights_bc
		/ sqrt(sum(weights_bc.^2) // makes weights_bc have a unit norm
		* sqrt(1-(weight_a.^2)) // makes weights have a unit norm
	) ;
}	
model{
	// priors for SEM
	a_unique ~ std_normal() ;
	b_latent ~ std_normal() ;
	c_latent ~ std_normal() ;
	// derived value for a
	vector[n_respondents] a_latent = (
		weights[1]*a_unique
		+ weights[2]*b_latent
		+ weights[3]*c_latent
	)
	// response-level priors & likelihood
	for(i in 1:n_items_A){
		cutpoints_A[i] ~ induced_dirichlet_0(n_ordered_options_per_item_A);
		A[:,i] ~ ordered_logistic( a_latent	, cutpoints_A[i] );
	}
	for(i in 1:n_items_B){
		cutpoints_B[i] ~ induced_dirichlet_0(n_ordered_options_per_item_B);
		B[:,i] ~ ordered_logistic( b_latent	, cutpoints_B[i] );
	}
	for(i in 1:n_items_C){
		cutpoints_C[i] ~ induced_dirichlet_0(n_ordered_options_per_item_C);
		C[:,i] ~ ordered_logistic( c_latent	, cutpoints_C[i] );
	}
}
1 Like

Greetings! Thank you very much for the provided code and apologies for getting back to this so late, I just got back from my holidays.

I’ve tried to run your code, but I am getting some errors. Most of them are just some small syntax errors. Firstly, I think that there was a closing bracket missing in line 89. Could you confirm that this bracket needs to be placed here and not in any of the sub-expressions in the lines before?

Secondly, there was a semicolon missing in line 101. Thirdly, in lines 59, 63 and 67 it needs to say n_respondents instead of n_resp. As a forth thing, in lines 65 and 67 n_items_B needs to be changed to n_items_C. But as I said, these are all smaller problems.

I am running into a more serious problem at line 104. The induced_dirichlet_0 requires two vectors as arguments, c and alpha. It seems that c is passed to the function in your code, but nothing that could represent alpha is being passed here. To try to bypass this problem I created the following variables in the data block:

vector[n_ordered_options_per_item_A] blah_a;
vector[n_ordered_options_per_item_B] blah_b;
vector[n_ordered_options_per_item_C] blah_c;

and I passed these as alpha arguments to the induced_dirichlet_0 functions in the model block. My code now looks like this:


functions {
	// Michael Betancourt's recommended prior for ordinal cutpoints:
	real induced_dirichlet_0_lpdf(vector c, vector alpha) {
		int K = num_elements(c) + 1;
		vector[K - 1] sigma = inv_logit( - c);
		vector[K] p;
		matrix[K, K] J = rep_matrix(0, K, K);

		// Induced ordinal probabilities
		p[1] = 1 - sigma[1];
		for (k in 2:(K - 1))
			p[k] = sigma[k - 1] - sigma[k];
		p[K] = sigma[K - 1];

		// Baseline column of Jacobian
		for (k in 1:K) J[k, 1] = 1;

		// Diagonal entries of Jacobian
		for (k in 2:K) {
			real rho = sigma[k - 1] * (1 - sigma[k - 1]);
			J[k, k] = - rho;
			J[k - 1, k] = rho;
		}

		return	 dirichlet_lpdf(p | alpha)
					 + log_determinant(J);
	}
	real induced_dirichlet_lpdf(vector c, vector alpha, real phi) {
		int K = num_elements(c) + 1;
		vector[K - 1] sigma = inv_logit(phi - c);
		vector[K] p;
		matrix[K, K] J = rep_matrix(0, K, K);

		// Induced ordinal probabilities
		p[1] = 1 - sigma[1];
		for (k in 2:(K - 1))
			p[k] = sigma[k - 1] - sigma[k];
		p[K] = sigma[K - 1];

		// Baseline column of Jacobian
		for (k in 1:K) J[k, 1] = 1;

		// Diagonal entries of Jacobian
		for (k in 2:K) {
			real rho = sigma[k - 1] * (1 - sigma[k - 1]);
			J[k, k] = - rho;
			J[k - 1, k] = rho;
		}

		return	 dirichlet_lpdf(p | alpha)
					 + log_determinant(J);
	}
}
data{
	int n_respondents ;

	int n_items_A ;
	int n_ordered_options_per_item_A ;
	matrix [n_respondents,n_items_A] A ;

	int n_items_B ;
	int n_ordered_options_per_item_B ;
	matrix [n_respondents,n_items_B] B ;

	int n_items_C ;
	int n_ordered_options_per_item_C ;
	matrix [n_respondents,n_items_C] C ;
	
	vector[n_ordered_options_per_item_A] blah_a;
	vector[n_ordered_options_per_item_B] blah_b;
	vector[n_ordered_options_per_item_C] blah_c;
}
parameters{
	// SEM parameters
	vector[n_respondents] a_unique ;
	vector[n_respondents] b_latent ;
	vector[n_respondents] c_latent ;
	vector<lower=-1,upper=1>[2] weights_bc ;
	real<lower=0,upper=1> weight_a ;
	// response-level parameters
	array[n_items_A] ordered[n_ordered_options_per_item_A-1] cutpoints_A ;
	array[n_items_B] ordered[n_ordered_options_per_item_B-1] cutpoints_B ;
	array[n_items_C] ordered[n_ordered_options_per_item_C-1] cutpoints_C ;
}
transformed parameters{
	vector[3] weights ;
	weights[1] = weight_a ;
	weights[2:3] = (
		weights_bc
		/ sqrt(sum(weights_bc.^2) // makes weights_bc have a unit norm
		* sqrt(1-(weight_a.^2)) // makes weights have a unit norm
	)) ;
}	
model{
	// priors for SEM
	a_unique ~ std_normal() ;
	b_latent ~ std_normal() ;
	c_latent ~ std_normal() ;
	// derived value for a
	vector[n_respondents] a_latent = (
		weights[1]*a_unique
		+ weights[2]*b_latent
		+ weights[3]*c_latent
	);
	// response-level priors & likelihood
	for(i in 1:n_items_A){
		cutpoints_A[i] ~ induced_dirichlet_0(n_ordered_options_per_item_A, blah_a);
		A[:,i] ~ ordered_logistic( a_latent	, cutpoints_A[i] );
	}
	for(i in 1:n_items_B){
		cutpoints_B[i] ~ induced_dirichlet_0(n_ordered_options_per_item_B, blah_b);
		B[:,i] ~ ordered_logistic( b_latent	, cutpoints_B[i] );
	}
	for(i in 1:n_items_C){
		cutpoints_C[i] ~ induced_dirichlet_0(n_ordered_options_per_item_C, blah_c);
		C[:,i] ~ ordered_logistic( c_latent	, cutpoints_C[i] );
	}
}

However, that didn’t seem to work. For some reason, Stan now thinks that I am passing three arguments to the the induced_dirichlet_0 function instead of two. I don’t understand why since there is clearly only one comma in the brackets, thus clearly indicating that there are two and not three arguments. This is the error that I am getting:

Semantic error in 'string', line 107, column 2 to column 77:
   -------------------------------------------------
   105:   // response-level priors & likelihood
   106:   for(i in 1:n_items_A){
   107:    cutpoints_A[i] ~ induced_dirichlet_0(n_ordered_options_per_item_A, blah_a);
           ^
   108:    A[:,i] ~ ordered_logistic( a_latent , cutpoints_A[i] );
   109:   }
   -------------------------------------------------

Ill-typed arguments supplied to function 'induced_dirichlet_0':
(vector, int, vector)
Available signatures:
(vector, vector) => real
  Expected 2 arguments but found 3 arguments.

Any ideas on how to continue with this? Any help would be appreciated.

This is because the Stan syntax

x ~ f(y) ;

Is a shorthand for:

target += f_lupdf( x | y ) ;

In both cases the function f is given two arguments, even though in the first case it only appears to be given one.

So, my original use of

cutpoints_A[i] ~ induced_dirichlet_0( n_ordered_options_per_item_A ) ;

was shorthand for

target += induced_dirichlet_0_lupdf( cutpoints_A[i] | n_ordered_options_per_item_A ) ;

BUT I think I did make an error in remembering what the alpha argument to the induced_dirichlet/induced_dirichlet_0 functions was supposed to be; where I used n_ordered_options_per_item_A, it actually should be rep_vector(1.0, n_ordered_options_per_item_A).