# Question on mutlivariate non-centered parametrizations

Hey folks !

I recently did some non-centered reparametrization on some of my models, and at least for my models who sample the parameters from independent normals this worked out very well (faster, no divergences etc.). However, I have now implemented the same non-centered reparametrization in the multivariate version of the model (which worked out ok but could be better asit has some divergences and convergence problems). The non-centered model works though, but not as well as expected. I think this is more about some errors I made as about the reparamtrization itself. So I have two questions on this:

1.) I used the reference manual to implement the cholesky non-centered, but Iâ€™m just not sure whether this works as intended, because I estimate the correlation matrix as well as sigma simultanioulsy. If I would assume uncorrelated parameters and just surpass an identity matrix as correlation matrix, would this be a valid method to simplyfy the model in this case? And: Is this non-centered implementation â€ścorrectâ€ť so far?

2.) How could I provide initial values in this case, that actually satisfy the sampler to be positiv ? As One Parameter is on log_scale, this should have boudaries from -Inf to Inf. Sampling all Parameters from the same std_normal is a problem here. Would it be â€śeffiecientâ€ť or reasonable to outsource this parameter from the MVN and sample it independently?

I maybe I can find some ideas here, help and opinions are appreciated ! Thanks for reading and your Opinions !

*J

Here are model codes for the non-centered and centered model:

Non-Centered:

``````data {
int <lower=0> N;  // number rownumber
int <lower=0> K;  // categories
int <lower=0> J;  // Dims of Cov Matrix
int <lower=0> Con;  // number of FT categories
int R[K];         // number of responses per category
int count[N*Con,K];   // observed data
real scale_b; // set scaling for background noise
vector[Con] Freetime;  // freetime conditions after distractor encoding (e.g. 500 ms, 1000 ms or 1500 ms)
int retrievals;
matrix[N*Con,2] d_weight; // Weights for Response Category inbalances
}

parameters {

// Defining vector for hyper and subject parameters

corr_matrix[J] Omega;
vector<lower=0>[J] sigma;
vector [J] hyper_pars;
vector[J] theta[N];

}

transformed parameters{

// Transform f Parameter
vector[J] subj_pars[N];
matrix[J, J] L;

real f[N] = inv_logit(theta[,3]);
real mu_f = inv_logit(hyper_pars[3]);

L = cholesky_decompose(Omega);

for (i in 1:N)
{

subj_pars[i,] = hyper_pars + sigma .* (L * theta[i,]);

}

// activations
real acts_IIP[N*Con];
real acts_IOP[N*Con];
real acts_DIP[N*Con];
real acts_DIOP[N*Con];
real acts_NPL[N*Con];

// probabilities
vector[K] probs[N*Con];
real SummedActs[N*Con];

// loop over subjects and conditions to compute activations and probabilites

for (i in 1:N){ // for each subject

for(j in 1:Con) {

acts_IIP[j + (i-1)*Con] = scale_b +  (1+ subj_pars[i,4]*Freetime[j])* (subj_pars[i,1] + subj_pars[i,2]); // Item in Position
acts_IOP[j + (i-1)*Con] = scale_b +  (1+ subj_pars[i,4]*Freetime[j])* subj_pars[i,2];        // Item in Other Position
acts_DIP[j + (i-1)*Con] = scale_b + f[i]*((exp(-subj_pars[i,5]*Freetime[j])*subj_pars[i,1]) + subj_pars[i,2]); // Distractor in Position
acts_DIOP[j + (i-1)*Con] = scale_b + f[i]*subj_pars[i,2]; // Distractor in other Position
acts_NPL[j + (i-1)*Con] = scale_b; // non presented Lure

SummedActs[j + (i-1)*Con] = R[1] * acts_IIP[j + (i-1)*Con] + R[2] * acts_IOP[j + (i-1)*Con] + d_weight[j + (i-1)*Con,1] * acts_DIP[j + (i-1)*Con]+ // account for different numbers auf DIP / DIOP
d_weight[j + (i-1)*Con, 2] * acts_DIOP[j + (i-1)*Con]+ R[5] * acts_NPL[j + (i-1)*Con];

probs[j + (i-1)*Con,1] = (R[1] * acts_IIP[j + (i-1)*Con]) ./ (SummedActs[j + (i-1)*Con]);
probs[j + (i-1)*Con,2] = (R[2] * acts_IOP[j + (i-1)*Con]) ./ (SummedActs[j + (i-1)*Con]);
probs[j + (i-1)*Con,3] = (d_weight[j + (i-1)*Con,1] * acts_DIP[j + (i-1)*Con]) ./ (SummedActs[j + (i-1)*Con]);
probs[j + (i-1)*Con,4] = (d_weight[j + (i-1)*Con, 2] * acts_DIOP[j + (i-1)*Con]) ./ (SummedActs[j + (i-1)*Con]);
probs[j + (i-1)*Con,5] = (R[5] * acts_NPL[j + (i-1)*Con]) ./ (SummedActs[j + (i-1)*Con]);
}
}
}

model {

// priors for hyper parameters

hyper_pars[1] ~ normal(20,10); // c
hyper_pars[2] ~ normal(2,10); // a
hyper_pars[3] ~ normal(0,10); // f
hyper_pars[4] ~ normal(1,10); // EE
hyper_pars[5] ~ normal(1,10); // r

//
Omega ~ lkj_corr(2);
sigma ~ gamma(1,0.01);

// Loop over subjects

for (i in 1:N)
{
theta[i,] ~ normal(0,1);

}

for (j in 1:Con) {
for (i in 1:N) {
// draw data from probabilities determined by MMM parms
count[j + (i-1)*Con,]  ~ multinomial(probs[j + (i-1)*Con,]);
}
}
}

``````

Centered:

``````
data {
int <lower=0> N;  // number rownumber
int <lower=0> K;  // categories
int <lower=0> J;  // Dims of Cov Matrix
int <lower=0> Con;  // number of categories
int R[K];         // number of responses per category
int count[N*Con,K];   // observed data
real scale_b;     // set scaling for background noise
vector[Con] Freetime ;        // freetime conditions after distractor encoding (e.g. 500 ms, 1000 ms or 1500 ms)
}

parameters {

// Defining vector for hyper and subject parameters

corr_matrix[J] Omega;
vector<lower=0>[J] sigma;
vector [J] hyper_pars;
vector [J] subj_pars[N];

}

transformed parameters{
// Transform f Parameter

real f[N] = inv_logit(subj_pars[,3]);
real mu_f = inv_logit(hyper_pars[3]);

// activations
real acts_IIP[N*Con];
real acts_IOP[N*Con];
real acts_DIP[N*Con];
real acts_DIOP[N*Con];
real acts_NPL[N*Con];

// probabilities
vector[K] probs[N*Con];
real SummedActs[N*Con];

// loop over subjects and conditions to compute activations and probabilites

for (i in 1:N){ // for each subject

for(j in 1:Con) {

acts_IIP[j + (i-1)*Con] = scale_b + (1+ subj_pars[i,4]*Freetime[j])*( subj_pars[i,2] + subj_pars[i,1]); // Item in Position
acts_IOP[j + (i-1)*Con] = scale_b + (1+ subj_pars[i,4]*Freetime[j])* subj_pars[i,2];        // Item in Other Position
acts_DIP[j + (i-1)*Con] = scale_b + exp(-subj_pars[i,5]*Freetime[j])* subj_pars[i,3]*( subj_pars[i,2]+ subj_pars[i,1]);// Distractor in Position
acts_DIOP[j + (i-1)*Con] = scale_b + exp(-subj_pars[i,5]*Freetime[j])* subj_pars[i,3]* subj_pars[i,2]; // Distractor in other Position
acts_NPL[j + (i-1)*Con] = scale_b; // non presented Lure

SummedActs[j + (i-1)*Con] = R[1] * acts_IIP[j + (i-1)*Con] + R[2] * acts_IOP[j + (i-1)*Con] + R[3] * acts_DIP[j + (i-1)*Con]+
R[4] * acts_DIOP[j + (i-1)*Con]+ R[5] * acts_NPL[j + (i-1)*Con];

probs[j + (i-1)*Con,1] = (R[1] * acts_IIP[j + (i-1)*Con]) ./ (SummedActs[j + (i-1)*Con]);
probs[j + (i-1)*Con,2] = (R[2] * acts_IOP[j + (i-1)*Con]) ./ (SummedActs[j + (i-1)*Con]);
probs[j + (i-1)*Con,3] = (R[3] * acts_DIP[j + (i-1)*Con]) ./ (SummedActs[j + (i-1)*Con]);
probs[j + (i-1)*Con,4] = (R[4] * acts_DIOP[j + (i-1)*Con]) ./ (SummedActs[j + (i-1)*Con]);
probs[j + (i-1)*Con,5] = (R[5] * acts_NPL[j + (i-1)*Con]) ./ (SummedActs[j + (i-1)*Con]);
}
}
}

model {

// priors for hyper parameters

hyper_pars[1] ~ normal(20,10); // c
hyper_pars[2] ~ normal(2,10); // a
hyper_pars[3] ~ normal(0,10); //f
hyper_pars[4] ~ normal(1,10); // EE
hyper_pars[5] ~ normal(1,10); // r

Omega ~ lkj_corr(2);
sigma ~ gamma(1,0.01);

// Loop over subjects

for (j in 1:Con) {
for (i in 1:N) {
// draw data from probabilities determined by MMM parms
count[j + (i-1)*Con,]  ~ multinomial(probs[j + (i-1)*Con,]);
}
}
}

``````

See SUG 1.13 (â€śMultivariate priors for hierarchical modelsâ€ť); Your implementation looks correct, but it will sample faster if you structure things so that the correlation-related paramter is of type `cholesky_factor_corr`. Below is a version with that plus derivation of the correlation matrix in the GQ; note that `subj_pars` is transposed relative to what you had, but I made the appropriate transpose of the indices where itâ€™s used later:

``````functions{
// flatten_lower_tri: function that returns the lower-tri of a matrix, flattened to a vector
vector flatten_lower_tri(matrix mat) {
int n_cols = cols(mat) ;
int n_uniq = (n_cols * (n_cols - 1)) %/% 2;
vector[n_uniq] out ;
int i = 1;
for(c in 1:(n_cols-1)){
for(r in (c+1):n_cols){
out[i] = mat[r,c];
i += 1;
}
}
return(out) ;
}

}
...
parameters {

// Defining vector for hyper and subject parameters

cholesky_factor_corr[J] L_Omega;
vector<lower=0>[J] sigma;
vector [J] hyper_pars;
matrix[J,N] theta ;

}

transformed parameters{
// non-centered multivariate
matrix[J,N] subj_pars =  (
diag_pre_multiply( sigma, L_Omega )
* theta
+ rep_matrix(hyper_pars,N)
) ;

...
for(j in 1:Con) {

acts_IIP[j + (i-1)*Con] = scale_b +  (1+ subj_pars[4,i]*Freetime[j])* (subj_pars[1,i] + subj_pars[2,i]); // Item in Position
acts_IOP[j + (i-1)*Con] = scale_b +  (1+ subj_pars[4,i]*Freetime[j])* subj_pars[2,i];        // Item in Other Position
acts_DIP[j + (i-1)*Con] = scale_b + f[i]*((exp(-subj_pars[5,i]*Freetime[j])*subj_pars[1,i]) + subj_pars[2,i]); // Distractor in Position
acts_DIOP[j + (i-1)*Con] = scale_b + f[i]*subj_pars[2,i]; // Distractor in other Position
acts_NPL[j + (i-1)*Con] = scale_b; // non presented Lure

SummedActs[j + (i-1)*Con] = R[1] * acts_IIP[j + (i-1)*Con] + R[2] * acts_IOP[j + (i-1)*Con] + d_weight[j + (i-1)*Con,1] * acts_DIP[j + (i-1)*Con]+ // account for different numbers auf DIP / DIOP
d_weight[j + (i-1)*Con, 2] * acts_DIOP[j + (i-1)*Con]+ R[5] * acts_NPL[j + (i-1)*Con];

probs[j + (i-1)*Con,1] = (R[1] * acts_IIP[j + (i-1)*Con]) ./ (SummedActs[j + (i-1)*Con]);
probs[j + (i-1)*Con,2] = (R[2] * acts_IOP[j + (i-1)*Con]) ./ (SummedActs[j + (i-1)*Con]);
probs[j + (i-1)*Con,3] = (d_weight[j + (i-1)*Con,1] * acts_DIP[j + (i-1)*Con]) ./ (SummedActs[j + (i-1)*Con]);
probs[j + (i-1)*Con,4] = (d_weight[j + (i-1)*Con, 2] * acts_DIOP[j + (i-1)*Con]) ./ (SummedActs[j + (i-1)*Con]);
probs[j + (i-1)*Con,5] = (R[5] * acts_NPL[j + (i-1)*Con]) ./ (SummedActs[j + (i-1)*Con]);
}
...
}

...

generated quantities{
vector[(J*(J-1))%/%2] cor_mat_lower_tri = flatten_lower_tri(multiply_lower_tri_self_transpose(L_Omega)) ;
}
``````

But do you intentionally not use the 3rd parameter in `subj_pars` in the model? (In your version thatâ€™s `subj_pars[,3]` and in mine itâ€™s `subj_pars[3,]`)

The default behaviour of Stanâ€™s sampler is to initialize with random values sampled as uniform from -2 to +2 (on the unconstrained scale). Do you encounter issues with the default initialization?

2 Likes

Hey Mike,

The model runs better from what I see from the the gradient evaluation times and only a handful divergences! I still have the problem, that I have negative probabilities in the likelihood of the mutlinomial, which seems to be related to the transformation of the `theta[3, ] `(`f`) parameter:

`Chain 1: Rejecting initial value: Chain 1: Error evaluating the log probability at the initial value. Chain 1: Exception: multinomial_lpmf: Probabilities parameter is not a valid simplex. Probabilities parameter[1] = -4.51233, but should be greater than or equal to 0 (in 'string', line 124, column 6 to column 66)`

I think this is about your question about the subj_par[3,i]. This parameter is strictly speaking a simplex between 0 and 1 and modulates the activations in this lines.

``````  acts_DIP[j + (i-1)*Con] = scale_b + f[i]*((exp(-subj_pars[5,i]*Freetime[j])*subj_pars[1,i]) + subj_pars[2,i]);

acts_DIOP[j + (i-1)*Con] = scale_b + f[i]*subj_pars[2,i];
``````

and these activations are subsequently normalized to probabilities:

`````` probs[j + (i-1)*Con,3] = (d_weight[j + (i-1)*Con,1] * acts_DIP[j + (i-1)*Con]) ./ (SummedActs[j + (i-1)*Con]);
probs[j + (i-1)*Con,4] = (d_weight[j + (i-1)*Con, 2] * acts_DIOP[j + (i-1)*Con]) ./ (SummedActs[j + (i-1)*Con]);
``````

As this model is meant to describe individual differences, itâ€™s convenient to use normal distributions for
the parameter values, even though it is only an approximation as the parameters are contrained to be positive. As the the `f` paramter is meant to be a in a range between [0;1] to modulate the beforementioned activations, it is is initially sampled from a normally distributed variable though, but subsequently transformed through a logistic transformation. Thatâ€™s why I passed inits to the model to prevent the negative probabilites in a MVN, as it is not possible to constrain only a subset of parameters and `f` has to be un contrained. For some reason, `f` becomes a negative, regardless of the transformation, which I suspect causes the problem described above.

In a less complex multivariate model version I solved this like this:

Inits:

``````init_fun <- function()
{

list(subj_pars=cbind(runif(stan_full_noFT.dat\$N,1,100), # c
runif(stan_full_noFT.dat\$N,1,100), # a
runif(stan_full_noFT.dat\$N,-10,100), # f
runif(stan_full_noFT.dat\$N,1,100), # ee
runif(stan_full_noFT.dat\$N,1,100))) #  r
}

``````
``````// Transform f Parameter

real f[N] = inv_logit(subj_pars[,3]);
real mu_f = inv_logit(hyper_pars[3]);
.....

acts[i,3] = scale_b + f[i]* (subj_pars[i,1]+subj_pars[i,2]);// Distractor in Position
acts[i,4] = scale_b + f[i]* subj_pars[i,2]; // Distractor in other Position

model{

// priors for hyper parameters

Omega ~ lkj_corr(2);
sigma ~ gamma(1,0.01);

hyper_pars[1] ~ normal(20,10);
hyper_pars[2] ~ normal(2,10);
hyper_pars[3] ~ normal(0,10);

// Sample Parameters

// Loop over subjects
for(i in 1:N){

// draw data from probabilities determined by MMM parms
count[i,]  ~ multinomial(probs[i,]);
}
}

``````

Now for the non-centered version I passed inits for theta:

``````init_fun_FT <- function()
{

list(theta=cbind(runif(stan_full_FT.dat\$N,0,1), # c
runif(stan_full_FT.dat\$N,0,1), # a
runif(stan_full_FT.dat\$N,-0.5,0.5), # f
runif(stan_full_FT.dat\$N,0,1), # ee
runif(stan_full_FT.dat\$N,0,1))) #  r
}

``````

So I now transform `theta[3, ]` (`f`) with the same logic as above:

Sample set of theta:

``````model {

// priors for hyper parameters

....
// Priors for variance
...

// Loop over subjects and sample theta[,i] for all subjects

for (i in 1:N)
{

theta[,i] ~ normal(0,1);
}

``````

Transform theta[3,], reparametrize subject parameters and subsequently calculate activations and probabilites.

`````` row_vector[N] f = inv_logit(theta[3,]);

matrix[J,N] subj_pars =  (
diag_pre_multiply( sigma, L_Omega )
* theta
+ rep_matrix(hyper_pars,N)) ;

....
for(j in 1:Con) {

acts_DIP[j + (i-1)*Con] = scale_b + f[i]*((exp(-subj_pars[5,i]*Freetime[j])*subj_pars[1,i]) + subj_pars[2,i]); // Distractor in Position
acts_DIOP[j + (i-1)*Con] = scale_b + f[i]*subj_pars[2,i]; // Distractor in other Position

probs[j + (i-1)*Con,3] = (d_weight[j + (i-1)*Con,1] * acts_DIP[j + (i-1)*Con]) ./ (SummedActs[j + (i-1)*Con]);
probs[j + (i-1)*Con,4] = (d_weight[j + (i-1)*Con, 2] * acts_DIOP[j + (i-1)*Con]) ./ (SummedActs[j + (i-1)*Con]);

}
}
``````

In this version, the model runs faster, but has problmes with the init values - sampling begins after some time, but its random whether all chains are initialized in the end (even thoug this happens most of the time) . It makes no difference, if I donâ€™t pass a init function. Chains look good though!

Again, thank you for your time !

best wishes

Jan