Sampling a 4-column matrix from a multivariate normal distribution with ordered pairs of columns

Cengiz_Zopluoglu · August 14, 2024, 1:52am

Not sure how to describe my problem, but I will give it a try. Below is a simplified example for a demo (please excuse inefficiencies in the code).

Suppose I have the following syntax. \theta is a matrix of latent variables. The first observed variable is a function of the first two columns of \theta, and the second observed variable is a function of the third and fourth columns of \theta.

How do I constrain \theta such that the values in the second column of \theta is always larger than the values in the first column of \theta, and the values in the fourth column of \theta is always larger than the values in the third column of \theta?

Thank you.

data {
  int N;                // number of observations
  matrix[N,2] y;        // observed variables
}
parameters{
  vector[4] theta[N];          // N x 4 matrix of latent variables 
  corr_matrix[4] omega_theta;  // correlation among the latent variables 
  vector[4] mu;                // mean vector of latent variables
  vector[4] sigma;             // sd vector of latent variables              
}
transformed parameters{
  cov_matrix[4] Sigma_P;    
  Sigma_P = quad_form_diag(omega_theta, sigma);  
}
model {
  mu    ~ normal(0,10);
  sigma ~ exponential(1);
  omega_theta       ~ lkj_corr(1); 
  
  theta ~ multi_normal(mu, Sigma_P);
  
  for(i in 1:N){
   real tmp1 = theta[i,2]-theta[i,1];
   real tmp2 = theta[i,4]-theta[i,3];
   
   y[i,1] ~ normal(tmp1,1);
   y[i,2] ~ normal(tmp2,1);
  }
}

spinkney · August 14, 2024, 2:36am

Cengiz_Zopluoglu · August 14, 2024, 5:10am

Thanks for pointing out this previous post. I don’t need rows to be ordered. I only need an ordering relationship between pairs of columns, so \theta_{i,2} > \theta_{i,1} and \theta_{i,4} > \theta_{i,3}.

The first two columns and the last two columns have different pairs of latent variables. I still want to generate all four columns from a multivariate normal distribution, but I only need the ordering constrain between pairs of columns, so not like \theta_{i,4} > \theta_{i,3} > \theta_{i,2} > \theta_{i,1}.

I am struggling to understand how to adopt the approach from the previous post and to simulate theta from a multivariate normal distribution with these constraints. Thank you.

spinkney · August 14, 2024, 1:20pm

Since there’s only 4 columns just do

parameters {
  vector[n] theta_1, theta_3;
  vector<lower=theta_1>[n] theta_2;
  vector<lower=theta_3>[n] theta_4;
}
transformed parameters {
 matrix[n, 4] theta = append_col(append_col(theta_1, theta_2), append_col(theta_3, theta_4));
}

Cengiz_Zopluoglu · August 14, 2024, 3:03pm

Then, does this imply that I sample theta_1, theta_2, theta_3, and theta_4 independently in the model statement? I just don’t understand how I will estimate the correlations among them. What happens to

theta ~ multi_normal(mu, Sigma_P);

in the model syntax?

spinkney · August 14, 2024, 3:29pm

A similar thing that happens to any constrained parameter. For e.g., if you constrain a vector to be positive reals and then increment the log probability by the log density of the std normal distribution, the outcome is not a std normal but a half-normal due to the change of variables. You’d have to work out the change of variables with your MVN to see exactly what distribution is implied.

parameters {
 vector<lower=0>[N] x;
}
model {
 x ~ std_normal();
}

Cengiz_Zopluoglu · August 14, 2024, 3:53pm

I see. I was hoping that there was something I could do with ordered. In the past, I could do something like the following when I only had a single pair of latent variables: I could use the multi_normal() function and still put the constraint that one column is smaller than the other. I guess there is no similar easy solution to what I am asking, but thank you for the ideas.

parameters { 
   real mu_thetat;             // mean for theta_t, fix this to zero
  real<lower=0> sigma_thetat; // sd for theta_t
  
  real mu_thetac;             // mean for theta_c
  real<lower=0> sigma_thetac; // sd for theta_c

  ordered[2] person[I]; 
}
transformed parameters{
  vector[2] mu_P;             // vector for mean vector of person parameters 
    mu_P[1] = mu_thetat;          // mu_thetat;                      
    mu_P[2] = mu_thetac;        // mu_thetac;
 
  vector[2] scale_P;          // vector of standard deviations for person parameters
   scale_P[1] = sigma_thetat;               
   scale_P[2] = sigma_thetac;
  
  cov_matrix[2] Sigma_P;      // covariance matrix for person parameters
  Sigma_P = quad_form_diag(omega_P, scale_P); 
}

model{
  
  mu_thetat     ~ normal(0,1);
  sigma_thetat  ~ exponential(1);
  
  mu_thetac     ~ normal(0,1);
  sigma_thetac  ~ exponential(1);
  
  omega_P       ~ lkj_corr(1);            

  person        ~ multi_normal(mu_P,Sigma_P);
}

Topic		Replies	Views
Correct way of sampling correlated multidimensional vectors Modeling	3	641	February 9, 2019
Enforcing independence Modeling	1	416	April 6, 2019
Constrain the mean of some variables of MVN to 0 if the MVN was reparametrized to a non-central parameterization Modeling	0	92	May 10, 2024
Sampling from Multi variate Normal with time varying Covariance Matrix in a way similar to matlab Modeling	5	1180	February 22, 2018
Sampling covariance matrix with additional constrain Modeling	5	645	December 5, 2018

Sampling a 4-column matrix from a multivariate normal distribution with ordered pairs of columns

Related topics