Non-static covariance matrices?

Erik_Ringen · January 16, 2018, 4:50am

Hello,

I am working on a multilevel ordinal regression model that is very similar to that described by Koster and McElreath in 'Multinomial analysis of behavior: statistical methods’. I’m attaching the code and data from that paper for reference. Publication script.R (84.7 KB)
data.csv (164.9 KB)

These models make the conventional assumption that the covariance matrix is static (in this case, that behavioral trade-offs are constant). However, there is good reason in this case to suspect that this is unrealistic–individuals of different ages plausibly face different trade-offs.

Thus, I’d like to specify the covariance between random effects as a function of age, but I’m stumped as to how I would do this in Stan. The model I have in mind would also necessitate removing age as a linear predictor from the model (because the current random effects structure captures variance not accounted for by age).

Thanks,
Erik

Charles_Driver · January 16, 2018, 10:28am

I describe the approach I took for this around pages 9-10 of:
https://www.researchgate.net/publication/310747801_Hierarchical_Bayesian_Continuous_Time_Dynamic_Modeling

The general idea is to sample the ‘raw’ correlation parameters from a normal(0,1), perturb these according to any covariates / random effects etc, put them into the upper and lower triangles of a symmetric matrix with 1’s on the diagonal, multiply this matrix by the transpose of itself, do a covariance to correlation transform on the matrix, then use this as usual - scale by a diagonal std dev matrix etc. The paper describes a slightly altered form for the sake of efficiency. This gives a similar prior to the LKJ approach.

Erik_Ringen · January 16, 2018, 7:25pm

Thanks, this sounds promising. I looked over the paper, but I am having a difficult time cracking the code in the appendix. Specifically I’m not sure where to add covariates or how to properly adapt for the multinomial context.

Charles_Driver · January 17, 2018, 11:57am

Sorry, I should have mentioned that you’re on your own for the multinomial aspect – I assume it just needs a link function, but I work mostly with continuous variables…

covariates would go in as an effect on the subject level parameters.

With following two stan functions you can pass a matrix containing unconstrained values on the lower triangle and sd’s on the diagonal to the lowertrisdcovsqrt2cov function, and get out a correlation matrix. Being unconstrained (and, at least in my design, normally distributed), it’s straightforward to sample the lower triangular parameters hierarchically with whatever effects you like.

  matrix covsqrt2corsqrt(matrix mat, int invert){ //converts from lower triangular unconstrained matrix to cor sqrt matrix
    matrix[rows(mat),cols(mat)] o;
    vector[rows(mat)] s;
    o=mat;
    
    for(i in 1:rows(o)){ //set upper tri to lower
      for(j in min(i+1,rows(mat)):rows(mat)){
        o[j,i] = inv_logit(o[j,i] )*2-1;  
        o[i,j] = o[j,i];
      }
      o[i,i]=1; // change to adjust prior for correlations
    }
    if(invert==1) o = inverse(o);
    
    for(i in 1:rows(o)){
      s[i] = inv_sqrt(o[i,] * o[,i]);
      if(is_inf(s[i])) s[i]=0;
    }
    o= diag_pre_multiply(s,o);
    return o;
  }
  
  matrix sdcovsqrt2cov(matrix mat, int cholesky){ //converts from lower triangular unconstrained and diag sd to cov or cholesky cov
    matrix[rows(mat),rows(mat)] out;
    int invert;
    
    invert = 0; //change integer for marginal or partial prior
    if(rows(mat) > 1){
      out=covsqrt2corsqrt(mat,invert); 
      out= diag_pre_multiply(diagonal(mat), out);
    }
    if(rows(mat)==1) out[1,1] = mat[1,1];
    
    if(cholesky==0) out = out * out\';
    return(out);
  }

Erik_Ringen · January 17, 2018, 10:39pm

OK,I think I have a handle on it now (model is running as I type). Thanks for the help. I’ll report back with results.

Erik_Ringen · January 22, 2018, 12:22am

Just getting back to this. Maybe you can help clear up some confusion. I’m trying to specify a multinomial model where each behavior category K varies as a (polynomial) function of age, and the correlations between the random effects also vary as a function of age. I put something together using the functions you provided and the appendix of the paper you linked. The model fits OK, but I am confused by the product of the functions that you shared. It looks like the sdcovsqrt2cov function should return the cholesky factor of a co-variance matrix, but it does not (it returns a matrix with values in the upper triangle, and the matrix is not symmetric). Here’s my code:

functions{
matrix covsqrt2corsqrt(matrix mat, int invert){ //converts from lower triangular unconstrained matrix to cor sqrt matrix
matrix[rows(mat),cols(mat)] o;
vector[rows(mat)] s;
o=mat;

for(i in 1:rows(o)){ //set upper tri to lower
for(j in min(i+1,rows(mat)):rows(mat)){
o[j,i] = inv_logit(o[j,i] )*2-1;
o[i,j] = o[j,i];
}
o[i,i]=1; // change to adjust prior for correlations
}
if(invert==1) o = inverse(o);

for(i in 1:rows(o)){
s[i] = inv_sqrt(o[i,] * o[,i]);
if(is_inf(s[i])) s[i]=0;
}
o= diag_pre_multiply(s,o);
return o;
}

matrix sdcovsqrt2cov(matrix mat, int cholesky){ //converts from lower triangular unconstrained and diag sd to cov or cholesky cov
matrix[rows(mat),rows(mat)] out;
int invert;

invert = 0; //change integer for marginal or partial prior
if(rows(mat) > 1){
out = covsqrt2corsqrt(mat,invert);
out = diag_pre_multiply(diagonal(mat), out);
}
if(rows(mat)==1) out[1,1] = mat[1,1];

if(cholesky==0) out = out * out’;
return(out);
}
}

data{
int N; // number of obs
int N_id; // number of individuals
int y[N]; // multinomial outcome
int id[N];
int K; // number of outcome behavioral categories
int ndim; // K-1
real age_z[N]; // standardized age
real age_zq[N]; // standardized age^2
real age_id[N_id]; // age for each individual
real age_idq[N_id]; // age^2 for each individual
}

parameters{
real a[K-1]; // intercepts for each behavior, minus reference category
real bA[K-1]; // parameters for age
real bQ[K-1]; // parameters for age^2
matrix[N_id,K-1] v_id; // matrix of random effects
vector<lower=0>[K-1] sigma_id; // stddev of random effects

vector[(ndimndim-ndim)/2] pcorrvecmu; // pop mean of partial correlation vectors
vector[(ndimndim-ndim)/2] pcorrvec_age; // pop mean of effect of age on correlations
vector[(ndim*ndim-ndim)/2] pcorrvec_age2; // pop mean of effect of age sq on correlations
}

transformed parameters{
vector[(ndimndim-ndim)/2] p_merged[N_id]; // combined correlation parameters
matrix[ndim,ndim] pcovmat[N_id]; // lower cov matrix plus sigmas on diag
matrix[ndim,ndim] cholcovmat[N_id]; // cholesky form
{
int counter;
for(n in 1:N_id){
counter=0;
// p_merged is a vector for the global correlation parameters, plus the age parameters
p_merged[n] = pcorrvecmu + pcorrvec_ageage_id[n] + pcorrvec_age2*age_idq[n];
// putting sd for each behavior on the diagonal
pcovmat[n] = diag_matrix(sigma_id);
// filling matrix
for (j in 1:(ndim-1))
for (i in (j+1):ndim){
counter = counter + 1;
pcovmat[n][i,j] = p_merged[n][counter];
}
cholcovmat[n] = sdcovsqrt2cov(pcovmat[n],1);
}
}
}

model{
// priors for fixed effects
a ~ normal(0,1);
bA ~ normal(0,1);
bQ ~ normal(0,1);

// priors for unconstrained corr parameters
pcorrvecmu ~ normal(0,1);
pcorrvec_age ~ normal(0,1);
pcorrvec_age2 ~ normal(0,1);

// hyper-priors
sigma_id ~ exponential(1);
for (n in 1:N_id){
v_id[n,] ~ multi_normal_cholesky(rep_vector(0,K-1),cholcovmat[n]);
}

// Likelihood function
for ( i in 1:N ) {
vector[K] p;
for ( k in 1:(K-1) )
p[k] = a[k] + v_id[id[i],k] + bA[k]*age_z[i] + bQ[k]*age_zq[i];
p[K] = 0;
y[i] ~ categorical_logit( p );
}
}

generated quantities{
vector[N] log_lik;

for ( i in 1:N ) {
vector[K] p;
for ( k in 1:(K-1) )
p[k] = a[k] + v_id[id[i],k] + bA[k]*age_z[i] + bQ[k]*age_zq[i];
p[K] = 0;
log_lik[i] = categorical_logit_lpmf( y[i] | p );
}
}
"

Charles_Driver · January 22, 2018, 12:31am

sorry you’re right the functions would look confusing in that regard - at some point in getting it all working I switched from cholesky factors to ‘some other form of matrix square root’. if multiplied by its own transpose the correlation matrix should still come out though.

Erik_Ringen · January 22, 2018, 12:35am

OK, so if I set cholesky = 0 in the function argument, I would get the full correlation matrix (not the covariance matrix)?

Charles_Driver · January 22, 2018, 12:49am

correct - but best to print a few results to be sure!

Topic		Replies	Views
Adding covariance matrix to non-centered learning model Modeling cognitive-science	7	928	November 12, 2018
How to use a Covariance matrix prior in Bayesian ordinal regression model? rstanarm	3	683	March 18, 2020
Hierarchical model with subject-specific (spatial) correlation Modeling	1	435	April 25, 2019
Including data in covariance matrix Modeling specification	1	226	May 28, 2024
Multilevel modelling, adding correlation between all parameters General	2	1097	July 9, 2018

Non-static covariance matrices?

Related topics