# Question on mutlivariate non-centered parametrizations

Hey folks !

I recently did some non-centered reparametrization on some of my models, and at least for my models who sample the parameters from independent normals this worked out very well (faster, no divergences etc.). However, I have now implemented the same non-centered reparametrization in the multivariate version of the model (which worked out ok but could be better asit has some divergences and convergence problems). The non-centered model works though, but not as well as expected. I think this is more about some errors I made as about the reparamtrization itself. So I have two questions on this:

1.) I used the reference manual to implement the cholesky non-centered, but I’m just not sure whether this works as intended, because I estimate the correlation matrix as well as sigma simultanioulsy. If I would assume uncorrelated parameters and just surpass an identity matrix as correlation matrix, would this be a valid method to simplyfy the model in this case? And: Is this non-centered implementation “correct” so far?

2.) How could I provide initial values in this case, that actually satisfy the sampler to be positiv ? As One Parameter is on log_scale, this should have boudaries from -Inf to Inf. Sampling all Parameters from the same std_normal is a problem here. Would it be “effiecient” or reasonable to outsource this parameter from the MVN and sample it independently?

I maybe I can find some ideas here, help and opinions are appreciated ! Thanks for reading and your Opinions !

*J

Here are model codes for the non-centered and centered model:

Non-Centered:

``````data {
int <lower=0> N;  // number rownumber
int <lower=0> K;  // categories
int <lower=0> J;  // Dims of Cov Matrix
int <lower=0> Con;  // number of FT categories
int R[K];         // number of responses per category
int count[N*Con,K];   // observed data
real scale_b; // set scaling for background noise
vector[Con] Freetime;  // freetime conditions after distractor encoding (e.g. 500 ms, 1000 ms or 1500 ms)
int retrievals;
matrix[N*Con,2] d_weight; // Weights for Response Category inbalances
}

parameters {

// Defining vector for hyper and subject parameters

corr_matrix[J] Omega;
vector<lower=0>[J] sigma;
vector [J] hyper_pars;
vector[J] theta[N];

}

transformed parameters{

// Transform f Parameter
vector[J] subj_pars[N];
matrix[J, J] L;

real f[N] = inv_logit(theta[,3]);
real mu_f = inv_logit(hyper_pars);

L = cholesky_decompose(Omega);

for (i in 1:N)
{

subj_pars[i,] = hyper_pars + sigma .* (L * theta[i,]);

}

// activations
real acts_IIP[N*Con];
real acts_IOP[N*Con];
real acts_DIP[N*Con];
real acts_DIOP[N*Con];
real acts_NPL[N*Con];

// probabilities
vector[K] probs[N*Con];
real SummedActs[N*Con];

// loop over subjects and conditions to compute activations and probabilites

for (i in 1:N){ // for each subject

for(j in 1:Con) {

acts_IIP[j + (i-1)*Con] = scale_b +  (1+ subj_pars[i,4]*Freetime[j])* (subj_pars[i,1] + subj_pars[i,2]); // Item in Position
acts_IOP[j + (i-1)*Con] = scale_b +  (1+ subj_pars[i,4]*Freetime[j])* subj_pars[i,2];        // Item in Other Position
acts_DIP[j + (i-1)*Con] = scale_b + f[i]*((exp(-subj_pars[i,5]*Freetime[j])*subj_pars[i,1]) + subj_pars[i,2]); // Distractor in Position
acts_DIOP[j + (i-1)*Con] = scale_b + f[i]*subj_pars[i,2]; // Distractor in other Position
acts_NPL[j + (i-1)*Con] = scale_b; // non presented Lure

SummedActs[j + (i-1)*Con] = R * acts_IIP[j + (i-1)*Con] + R * acts_IOP[j + (i-1)*Con] + d_weight[j + (i-1)*Con,1] * acts_DIP[j + (i-1)*Con]+ // account for different numbers auf DIP / DIOP
d_weight[j + (i-1)*Con, 2] * acts_DIOP[j + (i-1)*Con]+ R * acts_NPL[j + (i-1)*Con];

probs[j + (i-1)*Con,1] = (R * acts_IIP[j + (i-1)*Con]) ./ (SummedActs[j + (i-1)*Con]);
probs[j + (i-1)*Con,2] = (R * acts_IOP[j + (i-1)*Con]) ./ (SummedActs[j + (i-1)*Con]);
probs[j + (i-1)*Con,3] = (d_weight[j + (i-1)*Con,1] * acts_DIP[j + (i-1)*Con]) ./ (SummedActs[j + (i-1)*Con]);
probs[j + (i-1)*Con,4] = (d_weight[j + (i-1)*Con, 2] * acts_DIOP[j + (i-1)*Con]) ./ (SummedActs[j + (i-1)*Con]);
probs[j + (i-1)*Con,5] = (R * acts_NPL[j + (i-1)*Con]) ./ (SummedActs[j + (i-1)*Con]);
}
}
}

model {

// priors for hyper parameters

hyper_pars ~ normal(20,10); // c
hyper_pars ~ normal(2,10); // a
hyper_pars ~ normal(0,10); // f
hyper_pars ~ normal(1,10); // EE
hyper_pars ~ normal(1,10); // r

//
Omega ~ lkj_corr(2);
sigma ~ gamma(1,0.01);

// Loop over subjects

for (i in 1:N)
{
theta[i,] ~ normal(0,1);

}

for (j in 1:Con) {
for (i in 1:N) {
// draw data from probabilities determined by MMM parms
count[j + (i-1)*Con,]  ~ multinomial(probs[j + (i-1)*Con,]);
}
}
}

``````

Centered:

``````
data {
int <lower=0> N;  // number rownumber
int <lower=0> K;  // categories
int <lower=0> J;  // Dims of Cov Matrix
int <lower=0> Con;  // number of categories
int R[K];         // number of responses per category
int count[N*Con,K];   // observed data
real scale_b;     // set scaling for background noise
vector[Con] Freetime ;        // freetime conditions after distractor encoding (e.g. 500 ms, 1000 ms or 1500 ms)
}

parameters {

// Defining vector for hyper and subject parameters

corr_matrix[J] Omega;
vector<lower=0>[J] sigma;
vector [J] hyper_pars;
vector [J] subj_pars[N];

}

transformed parameters{
// Transform f Parameter

real f[N] = inv_logit(subj_pars[,3]);
real mu_f = inv_logit(hyper_pars);

// activations
real acts_IIP[N*Con];
real acts_IOP[N*Con];
real acts_DIP[N*Con];
real acts_DIOP[N*Con];
real acts_NPL[N*Con];

// probabilities
vector[K] probs[N*Con];
real SummedActs[N*Con];

// loop over subjects and conditions to compute activations and probabilites

for (i in 1:N){ // for each subject

for(j in 1:Con) {

acts_IIP[j + (i-1)*Con] = scale_b + (1+ subj_pars[i,4]*Freetime[j])*( subj_pars[i,2] + subj_pars[i,1]); // Item in Position
acts_IOP[j + (i-1)*Con] = scale_b + (1+ subj_pars[i,4]*Freetime[j])* subj_pars[i,2];        // Item in Other Position
acts_DIP[j + (i-1)*Con] = scale_b + exp(-subj_pars[i,5]*Freetime[j])* subj_pars[i,3]*( subj_pars[i,2]+ subj_pars[i,1]);// Distractor in Position
acts_DIOP[j + (i-1)*Con] = scale_b + exp(-subj_pars[i,5]*Freetime[j])* subj_pars[i,3]* subj_pars[i,2]; // Distractor in other Position
acts_NPL[j + (i-1)*Con] = scale_b; // non presented Lure

SummedActs[j + (i-1)*Con] = R * acts_IIP[j + (i-1)*Con] + R * acts_IOP[j + (i-1)*Con] + R * acts_DIP[j + (i-1)*Con]+
R * acts_DIOP[j + (i-1)*Con]+ R * acts_NPL[j + (i-1)*Con];

probs[j + (i-1)*Con,1] = (R * acts_IIP[j + (i-1)*Con]) ./ (SummedActs[j + (i-1)*Con]);
probs[j + (i-1)*Con,2] = (R * acts_IOP[j + (i-1)*Con]) ./ (SummedActs[j + (i-1)*Con]);
probs[j + (i-1)*Con,3] = (R * acts_DIP[j + (i-1)*Con]) ./ (SummedActs[j + (i-1)*Con]);
probs[j + (i-1)*Con,4] = (R * acts_DIOP[j + (i-1)*Con]) ./ (SummedActs[j + (i-1)*Con]);
probs[j + (i-1)*Con,5] = (R * acts_NPL[j + (i-1)*Con]) ./ (SummedActs[j + (i-1)*Con]);
}
}
}

model {

// priors for hyper parameters

hyper_pars ~ normal(20,10); // c
hyper_pars ~ normal(2,10); // a
hyper_pars ~ normal(0,10); //f
hyper_pars ~ normal(1,10); // EE
hyper_pars ~ normal(1,10); // r

Omega ~ lkj_corr(2);
sigma ~ gamma(1,0.01);

// Loop over subjects

for (j in 1:Con) {
for (i in 1:N) {
// draw data from probabilities determined by MMM parms
count[j + (i-1)*Con,]  ~ multinomial(probs[j + (i-1)*Con,]);
}
}
}

``````

See SUG 1.13 (“Multivariate priors for hierarchical models”); Your implementation looks correct, but it will sample faster if you structure things so that the correlation-related paramter is of type `cholesky_factor_corr`. Below is a version with that plus derivation of the correlation matrix in the GQ; note that `subj_pars` is transposed relative to what you had, but I made the appropriate transpose of the indices where it’s used later:

``````functions{
// flatten_lower_tri: function that returns the lower-tri of a matrix, flattened to a vector
vector flatten_lower_tri(matrix mat) {
int n_cols = cols(mat) ;
int n_uniq = (n_cols * (n_cols - 1)) %/% 2;
vector[n_uniq] out ;
int i = 1;
for(c in 1:(n_cols-1)){
for(r in (c+1):n_cols){
out[i] = mat[r,c];
i += 1;
}
}
return(out) ;
}

}
...
parameters {

// Defining vector for hyper and subject parameters

cholesky_factor_corr[J] L_Omega;
vector<lower=0>[J] sigma;
vector [J] hyper_pars;
matrix[J,N] theta ;

}

transformed parameters{
// non-centered multivariate
matrix[J,N] subj_pars =  (
diag_pre_multiply( sigma, L_Omega )
* theta
+ rep_matrix(hyper_pars,N)
) ;

...
for(j in 1:Con) {

acts_IIP[j + (i-1)*Con] = scale_b +  (1+ subj_pars[4,i]*Freetime[j])* (subj_pars[1,i] + subj_pars[2,i]); // Item in Position
acts_IOP[j + (i-1)*Con] = scale_b +  (1+ subj_pars[4,i]*Freetime[j])* subj_pars[2,i];        // Item in Other Position
acts_DIP[j + (i-1)*Con] = scale_b + f[i]*((exp(-subj_pars[5,i]*Freetime[j])*subj_pars[1,i]) + subj_pars[2,i]); // Distractor in Position
acts_DIOP[j + (i-1)*Con] = scale_b + f[i]*subj_pars[2,i]; // Distractor in other Position
acts_NPL[j + (i-1)*Con] = scale_b; // non presented Lure

SummedActs[j + (i-1)*Con] = R * acts_IIP[j + (i-1)*Con] + R * acts_IOP[j + (i-1)*Con] + d_weight[j + (i-1)*Con,1] * acts_DIP[j + (i-1)*Con]+ // account for different numbers auf DIP / DIOP
d_weight[j + (i-1)*Con, 2] * acts_DIOP[j + (i-1)*Con]+ R * acts_NPL[j + (i-1)*Con];

probs[j + (i-1)*Con,1] = (R * acts_IIP[j + (i-1)*Con]) ./ (SummedActs[j + (i-1)*Con]);
probs[j + (i-1)*Con,2] = (R * acts_IOP[j + (i-1)*Con]) ./ (SummedActs[j + (i-1)*Con]);
probs[j + (i-1)*Con,3] = (d_weight[j + (i-1)*Con,1] * acts_DIP[j + (i-1)*Con]) ./ (SummedActs[j + (i-1)*Con]);
probs[j + (i-1)*Con,4] = (d_weight[j + (i-1)*Con, 2] * acts_DIOP[j + (i-1)*Con]) ./ (SummedActs[j + (i-1)*Con]);
probs[j + (i-1)*Con,5] = (R * acts_NPL[j + (i-1)*Con]) ./ (SummedActs[j + (i-1)*Con]);
}
...
}

...

generated quantities{
vector[(J*(J-1))%/%2] cor_mat_lower_tri = flatten_lower_tri(multiply_lower_tri_self_transpose(L_Omega)) ;
}
``````

But do you intentionally not use the 3rd parameter in `subj_pars` in the model? (In your version that’s `subj_pars[,3]` and in mine it’s `subj_pars[3,]`)

The default behaviour of Stan’s sampler is to initialize with random values sampled as uniform from -2 to +2 (on the unconstrained scale). Do you encounter issues with the default initialization?

2 Likes

Hey Mike,

The model runs better from what I see from the the gradient evaluation times and only a handful divergences! I still have the problem, that I have negative probabilities in the likelihood of the mutlinomial, which seems to be related to the transformation of the `theta[3, ] `(`f`) parameter:

`Chain 1: Rejecting initial value: Chain 1: Error evaluating the log probability at the initial value. Chain 1: Exception: multinomial_lpmf: Probabilities parameter is not a valid simplex. Probabilities parameter = -4.51233, but should be greater than or equal to 0 (in 'string', line 124, column 6 to column 66)`

I think this is about your question about the subj_par[3,i]. This parameter is strictly speaking a simplex between 0 and 1 and modulates the activations in this lines.

``````  acts_DIP[j + (i-1)*Con] = scale_b + f[i]*((exp(-subj_pars[5,i]*Freetime[j])*subj_pars[1,i]) + subj_pars[2,i]);

acts_DIOP[j + (i-1)*Con] = scale_b + f[i]*subj_pars[2,i];
``````

and these activations are subsequently normalized to probabilities:

`````` probs[j + (i-1)*Con,3] = (d_weight[j + (i-1)*Con,1] * acts_DIP[j + (i-1)*Con]) ./ (SummedActs[j + (i-1)*Con]);
probs[j + (i-1)*Con,4] = (d_weight[j + (i-1)*Con, 2] * acts_DIOP[j + (i-1)*Con]) ./ (SummedActs[j + (i-1)*Con]);
``````

As this model is meant to describe individual differences, it’s convenient to use normal distributions for
the parameter values, even though it is only an approximation as the parameters are contrained to be positive. As the the `f` paramter is meant to be a in a range between [0;1] to modulate the beforementioned activations, it is is initially sampled from a normally distributed variable though, but subsequently transformed through a logistic transformation. That’s why I passed inits to the model to prevent the negative probabilites in a MVN, as it is not possible to constrain only a subset of parameters and `f` has to be un contrained. For some reason, `f` becomes a negative, regardless of the transformation, which I suspect causes the problem described above.

In a less complex multivariate model version I solved this like this:

Inits:

``````init_fun <- function()
{

list(subj_pars=cbind(runif(stan_full_noFT.dat\$N,1,100), # c
runif(stan_full_noFT.dat\$N,1,100), # a
runif(stan_full_noFT.dat\$N,-10,100), # f
runif(stan_full_noFT.dat\$N,1,100), # ee
runif(stan_full_noFT.dat\$N,1,100))) #  r
}

``````
``````// Transform f Parameter

real f[N] = inv_logit(subj_pars[,3]);
real mu_f = inv_logit(hyper_pars);
.....

acts[i,3] = scale_b + f[i]* (subj_pars[i,1]+subj_pars[i,2]);// Distractor in Position
acts[i,4] = scale_b + f[i]* subj_pars[i,2]; // Distractor in other Position

model{

// priors for hyper parameters

Omega ~ lkj_corr(2);
sigma ~ gamma(1,0.01);

hyper_pars ~ normal(20,10);
hyper_pars ~ normal(2,10);
hyper_pars ~ normal(0,10);

// Sample Parameters

// Loop over subjects
for(i in 1:N){

// draw data from probabilities determined by MMM parms
count[i,]  ~ multinomial(probs[i,]);
}
}

``````

Now for the non-centered version I passed inits for theta:

``````init_fun_FT <- function()
{

list(theta=cbind(runif(stan_full_FT.dat\$N,0,1), # c
runif(stan_full_FT.dat\$N,0,1), # a
runif(stan_full_FT.dat\$N,-0.5,0.5), # f
runif(stan_full_FT.dat\$N,0,1), # ee
runif(stan_full_FT.dat\$N,0,1))) #  r
}

``````

So I now transform `theta[3, ]` (`f`) with the same logic as above:

Sample set of theta:

``````model {

// priors for hyper parameters

....
// Priors for variance
...

// Loop over subjects and sample theta[,i] for all subjects

for (i in 1:N)
{

theta[,i] ~ normal(0,1);
}

``````

Transform theta[3,], reparametrize subject parameters and subsequently calculate activations and probabilites.

`````` row_vector[N] f = inv_logit(theta[3,]);

matrix[J,N] subj_pars =  (
diag_pre_multiply( sigma, L_Omega )
* theta
+ rep_matrix(hyper_pars,N)) ;

....
for(j in 1:Con) {

acts_DIP[j + (i-1)*Con] = scale_b + f[i]*((exp(-subj_pars[5,i]*Freetime[j])*subj_pars[1,i]) + subj_pars[2,i]); // Distractor in Position
acts_DIOP[j + (i-1)*Con] = scale_b + f[i]*subj_pars[2,i]; // Distractor in other Position

probs[j + (i-1)*Con,3] = (d_weight[j + (i-1)*Con,1] * acts_DIP[j + (i-1)*Con]) ./ (SummedActs[j + (i-1)*Con]);
probs[j + (i-1)*Con,4] = (d_weight[j + (i-1)*Con, 2] * acts_DIOP[j + (i-1)*Con]) ./ (SummedActs[j + (i-1)*Con]);

}
}
``````

In this version, the model runs faster, but has problmes with the init values - sampling begins after some time, but its random whether all chains are initialized in the end (even thoug this happens most of the time) . It makes no difference, if I don’t pass a init function. Chains look good though!

Again, thank you for your time !

best wishes

Jan