Hi,
I’m working on a program that tries to infer the prior distribution (the distribution over means and covariance matrices) of a number of multivariate Normal categories given categorization responses provided for stimuli in the multi-dimensional space (program attached below). The program works just fine as long as the relevant covariances matrices are ‘sufficiently small’ but, for covariance matrices with larger values, I’m getting exceptions that the covariance matrices are not symmetric, even though they should be/are except for numerical accuracy. I’d appreciate any pointers, e.g., how I could use different types of operations to avoid these issues (which I suspect stem from numerical precision?).
Here is an example error message (all errors involve large values):
Chain 3: Exception: model_mvg_conj_sufficient_stats_lapse_namespace::write_array: S_n is not symmetric. S_n[1,2] = -5.36654e+06, but S_n[2,1] = -5.36654e+06 (in 'mvg_conj_sufficient_stats_lapse', line 73, column 2 to column 25)
The stan model uses the Normal-Inverse-Wishart (NIW) prior described in Murphy (2012), and update it with the sufficient statistics (sample means, covariances, and counts of each category that subjects observed during ‘exposure’). The objective function is the likelihood of subjects’ categorization responses to test stimuli during test. This likelihood is multinomial density based the predicted probabilities of each category. That predicted probability results from applying Bayes theorem over prior category probabilities p(cat) [assumed to be uniform for now] and the category likelihood for the stimulus \mathcal{T}(stimulus | m_{n,cat}, \frac{1}{\kappa_{n,cat}(\nu_{n,cat} - D + 1)}S_{n,cat}, \nu_{n,cat} - D + 1)—the posterior predictive of the updated NIW after having seen n observations of a category during ‘exposure’, NIW(\mu_{cat}, \Sigma_{cat} | m_n, S_n, \kappa_n, \nu_n) (following Murphy). D is the dimensionality of the data (and the multivariate Normal categories).
Given that the error messages I’m getting (only sometimes for some chains and only for entries > 10^6) are about S_n
, my best guess is that the culprit is this line:
S_n[cat,group] = S_0[cat] +
x_ss[cat,group] +
kappa_0 * m_0[cat] * m_0[cat]' -
kappa_n[cat,group] * m_n[cat,group] * m_n[cat,group]';
I realize it is a big ask to wade through my code. But perhaps someone here sees something obvious or has some guesses / pointers I could explore. Thank you in advance!
The stan program:
/*
* Fit multinomial response data using a belief-updating model to infer prior
* parameters. A normal-inverse-Wishart prior is used, and it's assumed
* that the subject knows the true labels of all the input stimuli (e.g.,
* doesn't model any uncertainty in classification.
*
* This version has a lapse rate parameter (probability of random guessing)
*
* Input is in the form of raw data points for observations.
*
*
*/
data {
int M; // number of categories
int L; // number of exposure groups (e.g. subjects)
int K; // number of features
/* Efficiency considerations: declaring n as matrix rather than array of int is supposedly faster
(https://mc-stan.org/docs/2_18/stan-users-guide/indexing-efficiency-section.html). It also means
that standard arithmetic functions can be directly applied to that matrix (unlike to more-than-1-
dimensional arrays). */
matrix[M,L] N; // number of observations per category (M) and exposure group (L)
vector[K] x_mean[M,L]; // means for each category (M) and exposure group (L)
cov_matrix[K] x_ss[M,L]; // sum of uncentered squares matrix for each category (M) and group (L)
int N_test; // number of unique combinations of test locations & exposure groups
vector[K] x_test[N_test]; // locations (in cue space) of test trials
int y_test[N_test]; // exposure group indicator of test trials
int z_test_counts[N_test,M]; // responses for test trials (for each category M)
int<lower=0, upper=1> lapse_rate_known;
int<lower=0, upper=1> mu_0_known;
int<lower=0, upper=1> Sigma_0_known;
real lapse_rate_data[lapse_rate_known ? 1 : 0]; // optional: user provided lapse_rate
vector[mu_0_known ? K : 0] mu_0_data[mu_0_known ? M : 0]; // optional: user provided expected mu_0 (prior category means)
cov_matrix[Sigma_0_known ? K : 0] Sigma_0_data[Sigma_0_known ? M : 0]; // optional: user provided expected Sigma_0 (prior category covariance matrices)
/* For now, this script assumes that the observations (cue vectors) are centered. The prior
mean of m_0 is set to 0. In the future, one could automatically adjust the location based
on the input data (e.g., by placing the mean of the Normal prior in the center of the
exposure data).
*/
vector<lower=0>[mu_0_known ? 0 : K] tau_scale; // separate taus for each of the K features to capture that features can be on separate scales
// real<lower=0> tau_scale; // scale of cauchy prior for variances of m_0
real<lower=0> L_omega_scale; // scale of LKJ prior for correlation of variance of m_0
int<lower=0, upper=1> split_loglik_per_observation;
}
transformed data {
real sigma_kappanu;
/* Scale for the prior of kappa/nu_0. In order to deal with input that does not contain observations
(in which case n_each == 0), we set the minimum value for SD to 10.
*/
sigma_kappanu = max(N) > 0 ? max(N) * 4 : 10;
}
parameters {
// these are all shared across groups (same prior beliefs):
real<lower=K> kappa_0; // prior pseudocount for category mu
real<lower=K + 1> nu_0; // prior pseudocount for category Sigma
real<lower=0, upper=1> lapse_rate_param[lapse_rate_known ? 0 : 1];
vector[K] m_0_param[mu_0_known ? 0 : M]; // prior mean of means
vector<lower=0>[mu_0_known ? 0 : K] m_0_tau; // prior variances of m_0
cholesky_factor_corr[mu_0_known ? 0 : K] m_0_L_omega; // prior correlations of variances of m_0 (in cholesky form)
vector<lower=0>[K] tau_0_param[Sigma_0_known ? 0 : M]; // standard deviations of prior scatter matrix S_0
cholesky_factor_corr[K] L_omega_0_param[Sigma_0_known ? 0 : M]; // correlation matrix of prior scatter matrix S_0 (in cholesky form)
}
transformed parameters {
real lapse_rate;
vector[K] m_0[M]; // prior mean of means m_0
cov_matrix[K] S_0[M]; // prior scatter matrix S_0
// updated beliefs depend on input and group
real<lower=K> kappa_n[M,L]; // updated mean pseudocount
real<lower=K> nu_n[M,L]; // updated sd pseudocount
vector[K] m_n[M,L]; // updated expected mean
cov_matrix[K] S_n[M,L]; // updated expected scatter matrix
cov_matrix[K] t_scale[M,L]; // scale matrix of predictive t distribution
simplex[M] p_test_conj[N_test];
vector[M] log_p_test_conj[N_test];
if (lapse_rate_known) {
lapse_rate = lapse_rate_data[1];
} else {
lapse_rate = lapse_rate_param[1];
}
if (mu_0_known) {
m_0 = mu_0_data;
} else {
m_0 = m_0_param;
}
if (Sigma_0_known) {
// Get S_0 from expected Sigma given nu_0
for (cat in 1:M) {
S_0[cat] = Sigma_0_data[cat] * (nu_0 - K - 1);
}
}
// update NIW parameters according to conjugate updating rules are taken from
// Murphy (2007, p. 136)
for (cat in 1:M) {
if (!Sigma_0_known) {
// Get S_0 from its components: correlation matrix and vector of standard deviations
S_0[cat] = quad_form_diag(multiply_lower_tri_self_transpose(L_omega_0_param[cat]), tau_0_param[cat]);
}
for (group in 1:L) {
if (N[cat,group] > 0 ) {
kappa_n[cat,group] = kappa_0 + N[cat,group];
nu_n[cat,group] = nu_0 + N[cat,group];
m_n[cat,group] = (kappa_0 * m_0[cat] + N[cat,group] * x_mean[cat,group]) /
kappa_n[cat,group];
S_n[cat,group] = S_0[cat] +
x_ss[cat,group] +
kappa_0 * m_0[cat] * m_0[cat]' -
kappa_n[cat,group] * m_n[cat,group] * m_n[cat,group]';
} else {
kappa_n[cat,group] = kappa_0;
nu_n[cat,group] = nu_0;
m_n[cat,group] = m_0[cat];
S_n[cat,group] = S_0[cat];
}
t_scale[cat,group] = S_n[cat,group] * (kappa_n[cat,group] + 1) /
(kappa_n[cat,group] * (nu_n[cat,group] - K + 1));
}
}
// compute posterior probabilities of all categories for each of the test stimuli
for (j in 1:N_test) {
int group;
group = y_test[j];
/* calculate un-normalized log posterior probability for each category. Under the assumption
of uniform prior probabilities for each category, the log probabilities identical to the
normalized log likelihoods. If we ever were to change this assumption, we'd have to add
the log prior probabilities of categories here. */
for (cat in 1:M) {
// log likelihood of the test stimulus given the category
log_p_test_conj[j,cat] = multi_student_t_lpdf(x_test[j] |
nu_n[cat,group] - K + 1,
m_n[cat,group],
t_scale[cat,group]);
}
// normalize to get actual posterior probs in simplex
p_test_conj[j] = exp(log_p_test_conj[j] - log_sum_exp(log_p_test_conj[j]));
}
}
model {
vector[M] lapsing_probs;
/* Assuming unifom bias, so that lapsing_prob = probability of each category prior to
taking into account stimulus is 1/M */
lapsing_probs = rep_vector(lapse_rate / M, M);
kappa_0 ~ normal(0, sigma_kappanu);
nu_0 ~ normal(0, sigma_kappanu);
if (!mu_0_known) {
m_0_tau ~ cauchy(0, tau_scale);
m_0_L_omega ~ lkj_corr_cholesky(L_omega_scale);
m_0_param ~ multi_normal_cholesky(rep_vector(0, K), diag_pre_multiply(m_0_tau, m_0_L_omega));
}
// Specifying prior for components of S_0:
if (!Sigma_0_known) {
for (cat in 1:M) {
tau_0_param[cat] ~ cauchy(0, tau_scale);
L_omega_0_param[cat] ~ lkj_corr_cholesky(L_omega_scale);
}
}
for (i in 1:N_test) {
z_test_counts[i] ~ multinomial(p_test_conj[i] * (1 - lapse_rate) + lapsing_probs);
}
}