FYI: how to easily toggle between centered/non-centered

Because I haven’t seen it anywhere else, I thought I’d share what I came up with to encode a model with runtime selection of centered versus non-centered parameterizations for hierarchical parameters:

data {
  // K: number of identifiable-units-of-observation (IUOO)
  int K ;
  // N: number of observations total (must be at least one per IUOO)
  int<lower=K> N;
  // which_K: index associating each observation with its IUOO
  int<lower=1,upper=K> which_K[N] ;
  // Y: vector of observations
  vector[N] Y ;
  // centered: binary toggle for intercept centered/non-centered parameterization
  int<lower=0,upper=1> centered ;
}
parameters {
  // mu: mean (across-IUOOs) of X
  real mu ;
  // sigma: sd (across-IUOOs) of X
  real<lower=0> sigma ;
  // X: mean (across observations) for each IUOO
  vector<
    offset = (centered ? 0 : mu)
    , multiplier = (centered ? 1 : sigma)
  >[K] X ;
}
model {
  //hyper-priors
  mu ~ std_normal() ; //must be changed to reflect domain expertise
  sigma ~ std_normal() ; //must be changed to reflect domain expertise
  //hierarchical prior for X:
  X ~ normal( mu, sigma ) ;
  //likelihood:
  Y ~ normal( X[which_K], 1 ) ;
}
20 Likes

comment to enable me to mark this as solved so it doesn’t get included in the “needs love” category.

Cool, thanks for sharing that.

I think you can also do that by using the no-solution-needed tag.

2 Likes

@jonah did that tag exist and get removed? I don’t see it now.

The tag is visible to admins/mods only to reduce clutter… But I am not fundamentally opposed to making it visible to everyone…

@mike-lawrence this toggles all the individual units to all be centered or non-centered right? IIRC at StanCon 2023 you had some code to cleanly toggle the centered/non-centered status of individual units. Is that code available anywhere?

On another note, has anyone run into a case where the hierarchical prior is multivariate Gaussian with correlation and one component is well-identified, but the other is not? In that situation is there a way to make only one component of the vector parameter centered?

3 Likes

Is Stan allows vector multipliers / offsets, you could just make that centeredness thing a vector and multiply /exponentiate appropriately.

For noncentering parts of a priori correlated parameters, it could depend on your specific situation whether it works, but I’m not sure 🤔

If you need a way to automatically pick the appropriate centeredness, send me a message.

1 Like

See here

2 Likes

I’d be curious to see this; I’ve been intermittently working on my own scheme for doing this based on quantities computable from the data, and have a sense of which quantities are pertinent but haven’t worked out the proper formula for accurately predicting probability of divergence (and also have only considered the monolithic case).

If a standard hierarchical model with two correlated latents both with centered parameterizations is:

data {
	int<lower=2> N ; // number of hierarchical "groups"
	int<lower=2> K ; // number of observations per group per observable variable
	array[N] matrix[K,2] obs ; // observations
}
parameters{
	row_vector[2] mu ;
	row_vector<lower=0>[2] sigma ;
	matrix[N,2] group_mean ;
	corr_matrix[2] r ;
	...
}
model{
	...
	group_mean ~ multi_normal( mu, quad_form_diag(sigma, r )) ;
	...
}

with “non-centered for both variables” version implemented as:

...
transformed data{
	row_vector[2] zeroes = rep_row_vector(0,2) ;
	row_vector[2] ones = rep_row_vector(1,2) ;
}
parameters{
	row_vector[2] mu ;
	row_vector<lower=0>[2] sigma ;
	matrix[N,2] group_mean_std_deviate ;
	corr_matrix[2] r ;
	...
}
model{
	...
	group_mean_std_deviate ~ multi_normal( zeroes, r) ;
	matrix[N,2] group_mean = (
		group_mean_std_deviate
		.* rep_matrix(sigma,N)
		+ rep_matrix(mu,N)
	)
	...
}

Then the one-centered/one-non-centered would be:

...
parameters{
	row_vector[2] mu ;
	row_vector<lower=0>[2] sigma ;
	matrix[N,2] group_mean_helper ;
	corr_matrix[2] r ;
	...
}
model{
	...
	
	group_mean_helper ~ multi_normal( 
		[mu[1],0]
		, quad_form_diag(
			r
			, [sigma[1],1]
		)
	) ;
	matrix[N,2] group_mean ;
	group_mean[,1] = group_mean_helper[1] ;
	group_mean[,2] = group_mean_helper[2] .* sigma[2] + mu[2] ;
	...
}

I don’t think it’s possible though to have either of the variables non-monolitically parameterized.

1 Like

Oh, I possibly spoke too soon. I think I see a path via an equivalent SEM representation of the model.

For the both-non-centered, the SEM representation is:

parameters{
	row_vector[2] mu ;
	row_vector<lower=0>[2] sigma ;
	real<lower=0,upper=1> r ;
	vector[N] common_latent ;
	matrix[N,2] unique_latent ;
	...
}
model{
	...
	common_latent ~ std_normal() ;
	to_vector(unique_latent) ~ std_normal() ;
	matrix[N,2] group_mean_std_deviate = (
		rep_matrix(common_latent,2) .* r
		+ (unique_latent .* sqrt(1-square(r)))
	) ;
	matrix[N,2] group_mean = (
		group_mean_std_deviate
		.* rep_matrix(sigma,N)
		+ rep_matrix(mu,N)
	)
	...
}

And a quick-and-dirty implementation of the one-centered/one-non-centered is:

model{
	...
	common_latent ~ std_normal() ;
	unique_latent[1] ~ normal(mu[1],sigma[1]) ;
	unique_latent[2] ~ std_normal() ;
	
	matrix[N,2] group_mean_std_deviate = (
		rep_matrix(common_latent,2) .* r
		+ (
			append_col(
				(unique_latent[1]/sigma[1] - mu[1]) // transforming to standard deviate
				, unique_latent[2]
			) 
			.* sqrt(1-square(r))
		)
	) ;
	matrix[N,2] group_mean = (
		group_mean_std_deviate
		.* rep_matrix(sigma,N)
		+ rep_matrix(mu,N)
	)
	...
}

Which frames things in a format that should be amenable to the approach used in my tutorial to tag individual groups in each of the variables as needing centered or non-centered. (i.e. instead of monolithically expressing the prior for each column in unique_latent as above, you can express separate priors for those entries that need centered and those that need non-centered, with consequent selective re-transform to standard deviates for the SEM thereafter)

Here’s what that’d look like:

model{
	...
	common_latent ~ std_normal() ;
	unique_latent[var1_id_index_centered_and_scaled,1] ~ normal(mu[1],sigma[1]) ;
	unique_latent[var1_id_index_uncentered_and_unscaled,1] ~ std_normal() ;
	unique_latent[var2_id_index_centered_and_scaled,2] ~ normal(mu[2],sigma[2]) ;
	unique_latent[var2_id_index_uncentered_and_unscaled,2] ~ std_normal() ;
	matrix[N,2] unique_latent_for_sem ;
	unique_latent_for_sem[var1_id_index_centered_and_scaled,1] = unique_latent[var1_id_index_centered_and_scaled,1]/sigma[1] - mu[1] ;
	unique_latent_for_sem[var1_id_index_uncentered_and_unscaled,1] = unique_latent[var1_id_index_uncentered_and_unscaled,1] ;
	unique_latent_for_sem[var2_id_index_centered_and_scaled,2] = unique_latent[var2_id_index_centered_and_scaled,2]/sigma[2] - mu[2] ;
	unique_latent_for_sem[var2_id_index_uncentered_and_unscaled,2] = unique_latent[var2_id_index_uncentered_and_unscaled,2] ;
	matrix[N,2] group_mean_std_deviate = (
		rep_matrix(common_latent,2) .* r
		+ (
			unique_latent_for_sem
			.* sqrt(1-square(r))
		)
	) ;
	matrix[N,2] group_mean = (
		group_mean_std_deviate
		.* rep_matrix(sigma,N)
		+ rep_matrix(mu,N)
	)
	...
}
1 Like

oh, I was wrong that one needs to use the SEM approach to handle the non-monolithic case; to handle that case in multi-normal you’d loop over the N groupings to express the ~multi_normal() prior, using either the mu/sigma for the mean/sd or 0/1 depending on the centering requirements of each column for each grouping (and then transforming to the group_mean after). Something like:

model{
	...
	for(i_n in 1:N){
		group_mean_helper[i_n,] ~ multi_normal( 
			[
				(var1_id_is_centered_and_scaled[i_n] ? mu[1] : 0) // note `var1_id_is_centered_and_scaled` is an array[N] int<lower=0,upper=1>
				, (var2_id_is_centered_and_scaled[i_n] ? mu[2] : 0) // note `var2_id_is_centered_and_scaled` is an array[N] int<lower=0,upper=1>
			]
			, quad_form_diag(
				r
				, [
					(var1_id_is_centered_and_scaled[i_n] ? sigma[1] : 1)
					, (var2_id_is_centered_and_scaled[i_n] ? sigma[2] : 1)
				]
			)
		) ;
	}
	matrix[N,2] group_mean ;
	group_mean[var1_id_index_centered_and_scaled,1] = group_mean_helper[var1_id_index_centered_and_scaled,1] ; 
	group_mean[var1_id_index_uncentered_and_unscaled,1] = group_mean_helper[var1_id_index_uncentered_and_unscaled,1]*sigma[1] + mu[1] ;
	group_mean[var2_id_index_centered_and_scaled,2] = group_mean_helper[var2_id_index_centered_and_scaled,2] ;
	group_mean[var2_id_index_uncentered_and_unscaled,2] = group_mean_helper[var2_id_index_uncentered_and_unscaled,2]*sigma[2] + mu[2] ;
	...
}

And it’d be faster to pre-segregate the groupings into 4 groups (“var1_centered_var2_centered”, “var1_uncentered_var2_centered”, “var1_centered_var2_uncentred”, “var1_uncentered_var2_uncentered”), with indexing variables to pick out those groupings that fall in each, then you can reduce from N loops to 4.

1 Like

I think you have to do it “on the fly” while warming-up, which is relatively easy to do. If you are curious how to do that, send me a message.