Question about noncentered parameterization sampling structure

Corey.Plate · November 28, 2023, 5:38pm

Hey all,

Sorry to make yet another thread about this, but I’m still wrangling with some of the coding aspects of implementing our model. These are our parameter and modeling blocks:

parameters {

	//Parameters
	vector<lower=0,upper=1>[ns] guess;
	matrix<lower=0>[ns,ncont] sens;
	matrix[ns,ncont] crit;
	matrix<lower=0>[ns,ncond] meta;
	matrix<lower=0>[ns,ncond] nconf;
	matrix<lower=0>[ns,ncond] pconf;

	//Hyperparameters
	vector[ncont] snm;
	vector<lower=0>[ncont] sns;
	vector[ncont] cm;
	vector<lower=0>[ncont] cs;
	vector[ncond] mm;
	vector<lower=0>[ncond] ms;
	vector[ncond] nccm;
	vector<lower=0>[ncond] nccs;
	vector[ncond] pccm;
	vector<lower=0>[ncond] pccs;

}

model {

	//Hyperpriors
	snm 	~	normal(0,1);
	sns 	~	lognormal(0,1);
	cm 	~	normal(0,1);
	cs 	~	lognormal(0,1);
	mm 	~	normal(0,1);
	ms 	~	lognormal(0,1);
	nccm 	~	normal(0,1);
	nccs 	~	lognormal(0,1);
	pccm	~	normal(0,1);
	pccs	~	lognormal(0,1);

	guess 	~	beta(1,193.0/3.0);

	//Likelihood model (with local variable calculations)
	for (cond in 1:ncond) {

		matrix[sampn,ns] xtrans;
		matrix[sampn,ns] sds;

		//Priors
		meta[:,cond]  	~	lognormal(mm[cond],ms[cond]);
		nconf[:,cond]  	~	lognormal(nccm[cond],nccs[cond]);
		pconf[:,cond]	~	lognormal(pccm[cond],pccs[cond]);

		xtrans = logncdfinv(sampx,log(1/sqrt(meta[:,cond].^2+1)),sqrt(log(meta[:,cond].^2+1)));
		sds = 1./xtrans;

		for (cont in 1:ncont) {

			sens[:,cont]	~	lognormal(snm[cont],sns[cont]);
			crit[:,cont]	~	normal(cm[cont],cs[cont]);

			if (sum(abs(oris[cond][cont])) == 0) {

			} else {

				matrix[nt,ns] sm;
				vector[ns] sc;
				matrix[nt,ns] se;

				sm = oris[cond][cont].*rep_matrix(sens[:,cont]',nt);
				sc = crit[:,cont].*sens[:,cont];
				
				se = sm-rep_matrix(sc',nt);

				target += reduce_sum(partial_sum_casandre_log, resps[cond][cont], 1, guess, sds, se, nconf[:,cond], pconf[:,cond]);

			}

		}

	}

}

As you can see the sampling statements are inside of a for loop because the prior that the sample is drawn from has its own independent condition- or contrast-dependent noncentered paramterization. But is this actually something Stan is capable of doing? I ask because we are currently trying to debug why our model is highly autocorrelated and not converging (it has a pincushion shaped posterior). And I’m wondering if it has anything to do with how I’m trying to sample the parameters.

Thanks, as always, for any help!

Bob_Carpenter · November 30, 2023, 6:06pm

Sorry, but I didn’t see the connection to centered/non-centered parameterizations. What were you thinking here?

??? I didn’t see the definition of this function. There may just be numerical issues. For example,

log(1/sqrt(meta[:,cond].^2+1))
 == -log(sqrt(meta[:,cond].^2+1))
 == -0.5 log(meta[:, cond]^2 + 1)
 == -0.5 * log1p(meta[:, cond]^2)

And if you can make meta an array of vectors and transpose it, it’d save a lot of memory allocation and copying. I’d start by making sure some of this stuff is stable, like the above transform.

Also not clear what this is. I would generally recommend not using reduce_sum until things are debugged and you’re ready to optimize. Use smaller simulated data if need be.

Corey.Plate · November 30, 2023, 6:35pm

Hey, sorry, I focused only on my model and parameters block this time around to make sure it was not this particular issue. The reason reduce_sum is already in use is this is code adapted from an already functioning version of the program with a single condition and contrast level (and therefore no loop indexing). Here is the full code, also keeping in mind that the cluster I have access to is only up to cmdstan version 2.30.1, so some other more stable formulations I’ve been given in previous threads have not been possible to implement:

functions {

	//Custom function for lognormal inverse CDF
	matrix logncdfinv(vector x, vector m, vector s) {							

		int szx;
		int szm;
		szx = size(x);
		szm = size(m);
		matrix[szx,szm] y;
		y = exp( rep_matrix(s',szx) * sqrt(2) .* rep_matrix(inv_erfc( -2 .* x + 2 ),szm) + rep_matrix(m',szx));
		return y;

		}

	//Likelihood model for CASANDRE				
	real casandre_log(array[] matrix resps, vector guess, matrix sds, matrix se, vector nconf, vector pconf) {

		real llhC;
		llhC = 0.0;

		for (n in 1:size(resps)) {

			int m[rows(resps[n])];

			for (tt in 1:size(m)) {

				if (sum(resps[n][tt,:]) == 1) {

					m[tt] = 1;

				} else {

					m[tt] = 0;

				}

			}

			matrix[size(se[:,n]),4] lhC;

			if (sum(resps[n][:,:]) == 0) {

				llhC += 0;

			} else {

				int t;
				int ind[sum(m)];

				t = 1;

				matrix[size(se[:,n]),size(sds[:,n])] avgs;
				avgs = rep_matrix(se[:,n],size(sds[:,n])).*rep_matrix(sds[:,n]',size(se[:,n]));
		
				for (tr in 1:size(se[:,n])){

					if (sum(resps[n][tr,:]) == 0) {

					} else {

						matrix[3,size(sds[:,n])] raws;

						for (rws in 1:size(sds[:,n])) {
							raws[1,rws] = normal_cdf(-nconf[n],avgs[tr,rws],sds[rws,n]);
						}
						for (rws in 1:size(sds[:,n])) {
							raws[2,rws] = normal_cdf(0,avgs[tr,rws],sds[rws,n]);
						}
						for (rws in 1:size(sds[:,n])) {
							raws[3,rws] = normal_cdf(pconf[n],avgs[tr,rws],sds[rws,n]);
						}

						vector[3] ratiodist;

						ratiodist[1] = mean(raws[1,:]);
						ratiodist[2] = mean(raws[2,:]);
						ratiodist[3] = mean(raws[3,:]);

						lhC[tr,1] = ((guess[n]/4) + ((1-guess[n])*ratiodist[1]));
						lhC[tr,2] = ((guess[n]/4) + ((1-guess[n])*(ratiodist[2]-ratiodist[1])));
						lhC[tr,3] = ((guess[n]/4) + ((1-guess[n])*(ratiodist[3]-ratiodist[2])));
						lhC[tr,4] = ((guess[n]/4) + ((1-guess[n])*(1-ratiodist[3])));

						ind[t] = tr;

						t = t + 1;

					}

				}

				llhC += sum(columns_dot_product(resps[n][ind,:], log(lhC[ind,:])));

			}
		
		}

		return llhC;
		
	}

	//Partial sum function
	real partial_sum_casandre_log(array[] matrix slice_n_resps, int start, int end, vector guess, matrix sds, matrix se, vector nconf, vector pconf) {

	return casandre_log(slice_n_resps, guess[start:end], sds[:,start:end], se[:,start:end], nconf[start:end], pconf[start:end]);

	}
		
}

data {

	int ncond;
	int ncont;
	int ns;
	int nt;
	array[ncond,ncont,ns] matrix[nt,4] resps;
	array[ncond,ncont] matrix[nt,ns] oris;
	int sampn;
	vector[sampn] sampx;

}

parameters {

	//Parameters
	vector<lower=0,upper=1>[ns] guess;
	matrix<lower=0>[ncont,ns] sens;
	matrix[ncont,ns] crit;
	matrix<lower=0>[ncond,ns] meta;
	matrix<lower=0>[ncond,ns] nconf;
	matrix<lower=0>[ncond,ns] pconf;

	//Hyperparameters
	vector[ncont] snm;
	vector<lower=0>[ncont] sns;
	vector[ncont] cm;
	vector<lower=0>[ncont] cs;
	vector[ncond] mm;
	vector<lower=0>[ncond] ms;
	vector[ncond] nccm;
	vector<lower=0>[ncond] nccs;
	vector[ncond] pccm;
	vector<lower=0>[ncond] pccs;

}

model {

	//Hyperpriors
	snm 	~	normal(0,1);
	sns 	~	lognormal(0,1);
	cm 	~	normal(0,1);
	cs 	~	lognormal(0,1);
	mm 	~	normal(0,1);
	ms 	~	lognormal(0,1);
	nccm 	~	normal(0,1);
	nccs 	~	lognormal(0,1);
	pccm	~	normal(0,1);
	pccs	~	lognormal(0,1);

	guess 	~	beta(1,193.0/3.0);

	//Likelihood model (with local variable calculations)
	for (cond in 1:ncond) {

		array[ncond] matrix[sampn,ns] xtrans;
		array[ncond] matrix[sampn,ns] sds;

		//Priors
		meta[cond,:]  	~	lognormal(mm[cond],ms[cond]);
		nconf[cond,:]  	~	lognormal(nccm[cond],nccs[cond]);
		pconf[cond,:]	~	lognormal(pccm[cond],pccs[cond]);

		xtrans[cond] = logncdfinv(sampx,-0.5*log1p(meta[cond,:]^2),sqrt(log1p(meta[cond,:].^2)));
		sds[cond] = 1./xtrans[cond];

		for (cont in 1:ncont) {

			sens[cont,:]	~	lognormal(snm[cont],sns[cont]);
			crit[cont,:]	~	normal(cm[cont],cs[cont]);

			if (sum(abs(oris[cond][cont])) == 0) {

			} else {

				array[ncont] matrix[nt,ns] sm;
				array[ncont] vector[ns] sc;
				array[ncont] matrix[nt,ns] se;

				sm[cont] = oris[cond][cont].*rep_matrix(sens[cont,:]',nt);
				sc[cont] = crit[cont,:].*sens[cont,:];
				
				se[cont] = sm[cont]-rep_matrix(sc[cont]',nt);

				target += reduce_sum(partial_sum_casandre_log, resps[cond][cont], 1, guess, sds[cond], se[cont], nconf[cond,:], pconf[cond,:]);

			}

		}

	}

}

Edit: just implemented log1p from your feedback

Bob_Carpenter · November 30, 2023, 7:10pm

OK, that makes sense on the reduce_sum implementation, but it makes it hard for me to read what’s going on. And I still didn’t see the non-centered parameterization from the title.

Corey.Plate:

matrix logncdfinv(vector x, vector m, vector s) {							

		int szx;
		int szm;
		szx = size(x);
		szm = size(m);
		matrix[szx,szm] y;
		y = exp( rep_matrix(s',szx) * sqrt(2) .* rep_matrix(inv_erfc( -2 .* x + 2 ),szm) + rep_matrix(m',szx));
		return y;

		}

For a start, don’t use tabs in code! They render inconsistently on different platforms and are generally considered harmful to source code. The other thing that will help readability will be to use compound declare/define, which has been in place since 2.30.

matrix logncdfinv(vector x, vector m, vector s) {							
  int szx = size(x)
  int szm = size(m);
  matrix[szx,szm] y = exp(rep_matrix(s',szx) * sqrt(2) .* rep_matrix(inv_erfc( -2 .* x + 2 ),szm) + rep_matrix(m',szx));
  return y;
}

but I’d just write this simple function as a one-liner, pulling multiplications in to where they’re smaller, because I find size(x) easier to read than szx and the evaluation of a size function is basically free.

matrix logncdfinv(vector x, vector m, vector s) {							
  return exp(rep_matrix(s' * sqrt(2), size(x))
             .* rep_matrix(inv_erfc(-2 * x + 2), size(m))
             + rep_matrix(m', size(x)));
}

You also don’t need element wise products when multiplying scalars

For this,

           for (tt in 1:size(m)) {
				if (sum(resps[n][tt,:]) == 1) {
					m[tt] = 1;
				} else {
					m[tt] = 0;
				}
			}

you can just do this:

for (n in 1:size(m)) {
  m[n] = sum(resps[n, tt]) == 1;
}

The Stan array indexing is meant to chain, so what you wrote as resps[n][tt, :] is the same as resps[n][tt], which is the same as resps[n, tt].

Given that adding 0 does nothing, you can replace

if (sum(resps[n][:,:]) == 0) {
  llhC += 0;
} else {
...
}

with

if (sum(resps[n]) != 0) {
  ...
}

The term size(se[:,n]) should be replaced with rows(se)—the way you have it here allocates a new array and copies.

This can be refactored to be much more efficient:

avgs = rep_matrix(se[:,n],size(sds[:,n])).*rep_matrix(sds[:,n]',size(se[:,n]));

Unless I’m missing something, this is just an outer product se[ , n] * sds[ , n]', because entry (a, b) is the ath entry in se[ , n] mutliplied by the bth entry of sds[ , n] (you don’t need the :—it’s just there for compatibility with MATLAB, etc.)

Stan isn’t like R or Python, where it’s more efficient to put everything together into a matrix operation. The cost of building the matrices is high as is the cost of repeating arithmetic (because of autodiff).

The normal cdf function is not particularly stable, especially in the tails. I’d also strongly recommend space after commas. Similarly, subtraction is not very stable—if your guesses are close to 1, then writing 1 - guess can be problematic. Just things to check for that might be going wrong.

I think I see the centered param you are worried about, which is this and similar:

meta[cond,:]  	~	lognormal(mm[cond],ms[cond]);

This can just be meta[cond], etc. You don’t need trailing indexes if they’re universal. But given that meta, mm, and ms are parameters, this could cause problems. Another way of doing this is to declare meta_log_std instead of meta as a parameter, take a standardized normal sample,

meta_log_std[cond] ~ normal(0, 1);

and then define a new quantity equal to our original meta,

meta[cond] = exp(mm[cond] + meta_log_std * ms[cond]);

Corey.Plate · December 7, 2023, 2:27am

I implemented most of the changes you suggested, but the problem was actually that I was looping over conditions when sampling parameters that only changed with contrast! I pulled the sampling statements out of the loop and gave them their own respective loops and that fixed the problem. The conditional version of the model is done and parameter recovery looks great. I wanted to offer to make one final thread linking to and consolidating all of my individual threads about this code and sharing the final completed and tested code. Do you think that would be helpful?

Bob_Carpenter · December 8, 2023, 8:50pm

Yes!

Topic		Replies	Views
Adaptation and non-centered parameterisations Developers	10	1182	August 2, 2021
Posterior of my non-centered parameterization (of intercepts hierarchical model) is less navigable than my centered parameterization Modeling rstan , fitting-issues , specification , biology	5	454	February 24, 2024
Centered vs non-centered model, different loglikelihoods Modeling specification	6	636	May 15, 2024
Question on mutlivariate non-centered parametrizations Modeling rstan , techniques , fitting-issues , reparametrization	2	669	June 17, 2022
Non-centered parameterization for non-hierarchical parameters? Modeling reparametrization	15	1402	May 6, 2021

Question about noncentered parameterization sampling structure

Related topics