Question about noncentered parameterization sampling structure

Hey all,

Sorry to make yet another thread about this, but I’m still wrangling with some of the coding aspects of implementing our model. These are our parameter and modeling blocks:

``````parameters {

//Parameters
vector<lower=0,upper=1>[ns] guess;
matrix<lower=0>[ns,ncont] sens;
matrix[ns,ncont] crit;
matrix<lower=0>[ns,ncond] meta;
matrix<lower=0>[ns,ncond] nconf;
matrix<lower=0>[ns,ncond] pconf;

//Hyperparameters
vector[ncont] snm;
vector<lower=0>[ncont] sns;
vector[ncont] cm;
vector<lower=0>[ncont] cs;
vector[ncond] mm;
vector<lower=0>[ncond] ms;
vector[ncond] nccm;
vector<lower=0>[ncond] nccs;
vector[ncond] pccm;
vector<lower=0>[ncond] pccs;

}
``````
``````model {

//Hyperpriors
snm 	~	normal(0,1);
sns 	~	lognormal(0,1);
cm 	~	normal(0,1);
cs 	~	lognormal(0,1);
mm 	~	normal(0,1);
ms 	~	lognormal(0,1);
nccm 	~	normal(0,1);
nccs 	~	lognormal(0,1);
pccm	~	normal(0,1);
pccs	~	lognormal(0,1);

guess 	~	beta(1,193.0/3.0);

//Likelihood model (with local variable calculations)
for (cond in 1:ncond) {

matrix[sampn,ns] xtrans;
matrix[sampn,ns] sds;

//Priors
meta[:,cond]  	~	lognormal(mm[cond],ms[cond]);
nconf[:,cond]  	~	lognormal(nccm[cond],nccs[cond]);
pconf[:,cond]	~	lognormal(pccm[cond],pccs[cond]);

xtrans = logncdfinv(sampx,log(1/sqrt(meta[:,cond].^2+1)),sqrt(log(meta[:,cond].^2+1)));
sds = 1./xtrans;

for (cont in 1:ncont) {

sens[:,cont]	~	lognormal(snm[cont],sns[cont]);
crit[:,cont]	~	normal(cm[cont],cs[cont]);

if (sum(abs(oris[cond][cont])) == 0) {

} else {

matrix[nt,ns] sm;
vector[ns] sc;
matrix[nt,ns] se;

sm = oris[cond][cont].*rep_matrix(sens[:,cont]',nt);
sc = crit[:,cont].*sens[:,cont];

se = sm-rep_matrix(sc',nt);

target += reduce_sum(partial_sum_casandre_log, resps[cond][cont], 1, guess, sds, se, nconf[:,cond], pconf[:,cond]);

}

}

}

}
``````

As you can see the sampling statements are inside of a for loop because the prior that the sample is drawn from has its own independent condition- or contrast-dependent noncentered paramterization. But is this actually something Stan is capable of doing? I ask because we are currently trying to debug why our model is highly autocorrelated and not converging (it has a pincushion shaped posterior). And I’m wondering if it has anything to do with how I’m trying to sample the parameters.

Thanks, as always, for any help!

Sorry, but I didn’t see the connection to centered/non-centered parameterizations. What were you thinking here?

??? I didn’t see the definition of this function. There may just be numerical issues. For example,

``````log(1/sqrt(meta[:,cond].^2+1))
== -log(sqrt(meta[:,cond].^2+1))
== -0.5 log(meta[:, cond]^2 + 1)
== -0.5 * log1p(meta[:, cond]^2)
``````

And if you can make `meta` an array of vectors and transpose it, it’d save a lot of memory allocation and copying. I’d start by making sure some of this stuff is stable, like the above transform.

Also not clear what this is. I would generally recommend not using reduce_sum until things are debugged and you’re ready to optimize. Use smaller simulated data if need be.

Hey, sorry, I focused only on my model and parameters block this time around to make sure it was not this particular issue. The reason reduce_sum is already in use is this is code adapted from an already functioning version of the program with a single condition and contrast level (and therefore no loop indexing). Here is the full code, also keeping in mind that the cluster I have access to is only up to cmdstan version 2.30.1, so some other more stable formulations I’ve been given in previous threads have not been possible to implement:

``````functions {

//Custom function for lognormal inverse CDF
matrix logncdfinv(vector x, vector m, vector s) {

int szx;
int szm;
szx = size(x);
szm = size(m);
matrix[szx,szm] y;
y = exp( rep_matrix(s',szx) * sqrt(2) .* rep_matrix(inv_erfc( -2 .* x + 2 ),szm) + rep_matrix(m',szx));
return y;

}

//Likelihood model for CASANDRE
real casandre_log(array[] matrix resps, vector guess, matrix sds, matrix se, vector nconf, vector pconf) {

real llhC;
llhC = 0.0;

for (n in 1:size(resps)) {

int m[rows(resps[n])];

for (tt in 1:size(m)) {

if (sum(resps[n][tt,:]) == 1) {

m[tt] = 1;

} else {

m[tt] = 0;

}

}

matrix[size(se[:,n]),4] lhC;

if (sum(resps[n][:,:]) == 0) {

llhC += 0;

} else {

int t;
int ind[sum(m)];

t = 1;

matrix[size(se[:,n]),size(sds[:,n])] avgs;
avgs = rep_matrix(se[:,n],size(sds[:,n])).*rep_matrix(sds[:,n]',size(se[:,n]));

for (tr in 1:size(se[:,n])){

if (sum(resps[n][tr,:]) == 0) {

} else {

matrix[3,size(sds[:,n])] raws;

for (rws in 1:size(sds[:,n])) {
raws[1,rws] = normal_cdf(-nconf[n],avgs[tr,rws],sds[rws,n]);
}
for (rws in 1:size(sds[:,n])) {
raws[2,rws] = normal_cdf(0,avgs[tr,rws],sds[rws,n]);
}
for (rws in 1:size(sds[:,n])) {
raws[3,rws] = normal_cdf(pconf[n],avgs[tr,rws],sds[rws,n]);
}

vector[3] ratiodist;

ratiodist[1] = mean(raws[1,:]);
ratiodist[2] = mean(raws[2,:]);
ratiodist[3] = mean(raws[3,:]);

lhC[tr,1] = ((guess[n]/4) + ((1-guess[n])*ratiodist[1]));
lhC[tr,2] = ((guess[n]/4) + ((1-guess[n])*(ratiodist[2]-ratiodist[1])));
lhC[tr,3] = ((guess[n]/4) + ((1-guess[n])*(ratiodist[3]-ratiodist[2])));
lhC[tr,4] = ((guess[n]/4) + ((1-guess[n])*(1-ratiodist[3])));

ind[t] = tr;

t = t + 1;

}

}

llhC += sum(columns_dot_product(resps[n][ind,:], log(lhC[ind,:])));

}

}

return llhC;

}

//Partial sum function
real partial_sum_casandre_log(array[] matrix slice_n_resps, int start, int end, vector guess, matrix sds, matrix se, vector nconf, vector pconf) {

return casandre_log(slice_n_resps, guess[start:end], sds[:,start:end], se[:,start:end], nconf[start:end], pconf[start:end]);

}

}
``````
``````data {

int ncond;
int ncont;
int ns;
int nt;
array[ncond,ncont,ns] matrix[nt,4] resps;
array[ncond,ncont] matrix[nt,ns] oris;
int sampn;
vector[sampn] sampx;

}
``````
``````parameters {

//Parameters
vector<lower=0,upper=1>[ns] guess;
matrix<lower=0>[ncont,ns] sens;
matrix[ncont,ns] crit;
matrix<lower=0>[ncond,ns] meta;
matrix<lower=0>[ncond,ns] nconf;
matrix<lower=0>[ncond,ns] pconf;

//Hyperparameters
vector[ncont] snm;
vector<lower=0>[ncont] sns;
vector[ncont] cm;
vector<lower=0>[ncont] cs;
vector[ncond] mm;
vector<lower=0>[ncond] ms;
vector[ncond] nccm;
vector<lower=0>[ncond] nccs;
vector[ncond] pccm;
vector<lower=0>[ncond] pccs;

}
``````
``````model {

//Hyperpriors
snm 	~	normal(0,1);
sns 	~	lognormal(0,1);
cm 	~	normal(0,1);
cs 	~	lognormal(0,1);
mm 	~	normal(0,1);
ms 	~	lognormal(0,1);
nccm 	~	normal(0,1);
nccs 	~	lognormal(0,1);
pccm	~	normal(0,1);
pccs	~	lognormal(0,1);

guess 	~	beta(1,193.0/3.0);

//Likelihood model (with local variable calculations)
for (cond in 1:ncond) {

array[ncond] matrix[sampn,ns] xtrans;
array[ncond] matrix[sampn,ns] sds;

//Priors
meta[cond,:]  	~	lognormal(mm[cond],ms[cond]);
nconf[cond,:]  	~	lognormal(nccm[cond],nccs[cond]);
pconf[cond,:]	~	lognormal(pccm[cond],pccs[cond]);

xtrans[cond] = logncdfinv(sampx,-0.5*log1p(meta[cond,:]^2),sqrt(log1p(meta[cond,:].^2)));
sds[cond] = 1./xtrans[cond];

for (cont in 1:ncont) {

sens[cont,:]	~	lognormal(snm[cont],sns[cont]);
crit[cont,:]	~	normal(cm[cont],cs[cont]);

if (sum(abs(oris[cond][cont])) == 0) {

} else {

array[ncont] matrix[nt,ns] sm;
array[ncont] vector[ns] sc;
array[ncont] matrix[nt,ns] se;

sm[cont] = oris[cond][cont].*rep_matrix(sens[cont,:]',nt);
sc[cont] = crit[cont,:].*sens[cont,:];

se[cont] = sm[cont]-rep_matrix(sc[cont]',nt);

target += reduce_sum(partial_sum_casandre_log, resps[cond][cont], 1, guess, sds[cond], se[cont], nconf[cond,:], pconf[cond,:]);

}

}

}

}
``````

Edit: just implemented log1p from your feedback

OK, that makes sense on the reduce_sum implementation, but it makes it hard for me to read what’s going on. And I still didn’t see the non-centered parameterization from the title.

For a start, don’t use tabs in code! They render inconsistently on different platforms and are generally considered harmful to source code. The other thing that will help readability will be to use compound declare/define, which has been in place since 2.30.

``````matrix logncdfinv(vector x, vector m, vector s) {
int szx = size(x)
int szm = size(m);
matrix[szx,szm] y = exp(rep_matrix(s',szx) * sqrt(2) .* rep_matrix(inv_erfc( -2 .* x + 2 ),szm) + rep_matrix(m',szx));
return y;
}
``````

but I’d just write this simple function as a one-liner, pulling multiplications in to where they’re smaller, because I find `size(x)` easier to read than `szx` and the evaluation of a size function is basically free.

``````matrix logncdfinv(vector x, vector m, vector s) {
return exp(rep_matrix(s' * sqrt(2), size(x))
.* rep_matrix(inv_erfc(-2 * x + 2), size(m))
+ rep_matrix(m', size(x)));
}
``````

You also don’t need element wise products when multiplying scalars

For this,

``````           for (tt in 1:size(m)) {
if (sum(resps[n][tt,:]) == 1) {
m[tt] = 1;
} else {
m[tt] = 0;
}
}
``````

you can just do this:

``````for (n in 1:size(m)) {
m[n] = sum(resps[n, tt]) == 1;
}
``````

The Stan array indexing is meant to chain, so what you wrote as `resps[n][tt, :]` is the same as `resps[n][tt]`, which is the same as `resps[n, tt]`.

Given that adding 0 does nothing, you can replace

``````if (sum(resps[n][:,:]) == 0) {
llhC += 0;
} else {
...
}
``````

with

``````if (sum(resps[n]) != 0) {
...
}
``````

The term `size(se[:,n])` should be replaced with `rows(se)`—the way you have it here allocates a new array and copies.

This can be refactored to be much more efficient:

``````avgs = rep_matrix(se[:,n],size(sds[:,n])).*rep_matrix(sds[:,n]',size(se[:,n]));
``````

Unless I’m missing something, this is just an outer product `se[ , n] * sds[ , n]'`, because entry `(a, b)` is the `a`th entry in `se[ , n]` mutliplied by the `b`th entry of `sds[ , n]` (you don’t need the `:`—it’s just there for compatibility with MATLAB, etc.)

Stan isn’t like R or Python, where it’s more efficient to put everything together into a matrix operation. The cost of building the matrices is high as is the cost of repeating arithmetic (because of autodiff).

The normal cdf function is not particularly stable, especially in the tails. I’d also strongly recommend space after commas. Similarly, subtraction is not very stable—if your guesses are close to 1, then writing `1 - guess` can be problematic. Just things to check for that might be going wrong.

I think I see the centered param you are worried about, which is this and similar:

``````meta[cond,:]  	~	lognormal(mm[cond],ms[cond]);
``````

This can just be `meta[cond]`, etc. You don’t need trailing indexes if they’re universal. But given that `meta`, `mm`, and `ms` are parameters, this could cause problems. Another way of doing this is to declare `meta_log_std` instead of `meta` as a parameter, take a standardized normal sample,

``````meta_log_std[cond] ~ normal(0, 1);
``````

and then define a new quantity equal to our original `meta`,

``````meta[cond] = exp(mm[cond] + meta_log_std * ms[cond]);
``````
1 Like

I implemented most of the changes you suggested, but the problem was actually that I was looping over conditions when sampling parameters that only changed with contrast! I pulled the sampling statements out of the loop and gave them their own respective loops and that fixed the problem. The conditional version of the model is done and parameter recovery looks great. I wanted to offer to make one final thread linking to and consolidating all of my individual threads about this code and sharing the final completed and tested code. Do you think that would be helpful?

Yes!