Model with multivariate normal latent variable, "numerical integration"

In 2013, I requested a multivariate normal integral feature, but it looks like this isn’t going to happen, at least not any time soon.

So, I was thinking it might make sense to try to get the same result by modeling a latent multivariate normal variable and summing it over different regions. I wrote a Stan model today, and I got it to compile after some debugging, but when I try to fit it, it seems to hang (or maybe it’s just taking a very long time to do anything). It’s been a while since I wrote a Stan model from scratch, so I’m sure there are lots of things about this that are suboptimal (for the past few years, I have been using PyMC 2 to fit the models that led me to request the multivariate normal integral feature - my PyMC models are probably also suboptimal, but they work, so…).

Any advice would be greatly appreciated. I would love to get this stuff working in Stan if possible. Ultimately, I want to scale this up to fit a fairly large number of matrices simultaneously (for fairly complicated multilevel structure in my data). Here’s the model I wrote:

data {
  int<lower=0> n_stim; // number of stimulus categories
  int<lower=0> n_resp; // number of response categories
  int<lower=0> max_N; // number of simulated latent perceptual effects
  int<lower=0> cm[n_stim, n_resp]; // confusion matrix
  real<lower=-1,upper=1> signs[n_resp, 2]; // assumes 2D
}
parameters {
  vector[2] mu[n_stim]; // perceptual distribution means
  matrix[2,2] L[n_stim]; // Cholesky factored perceptual correlation matrices
}
transformed parameters {
  vector[2] mu_p[n_stim]; // prior means for mu
  matrix[2,2] mu_S; // prior variance for mu
  real<lower=0> tm[n_stim, n_resp]; // tallied latent variables
  vector[n_resp] pm[n_stim]; // predicted response probabilities
  vector[2] pe[n_stim,max_N]; // latent perceptual effects

  mu_S[1,1] = 2;
  mu_S[1,2] = 0;
  mu_S[2,1] = 0;
  mu_S[2,2] = 2;

  for (i in 1:n_stim) {
    mu_p[i,1] = -1*signs[i,1];
    mu_p[i,2] = -1*signs[i,2];
    for (j in 1:n_resp) {
      for (k in 1:max_N) {
        tm[i,j] +=  (signs[j,1]*pe[i,k,1] < 0) && (signs[j,2]*pe[i,k,2] < 0);
      }
      pm[i,j] = tm[i,j]/sum(tm[i]);
    }
  }
}
model {
  for (i in 1:n_stim) {
    pe[i] ~ multi_normal_cholesky(mu[i], L[i]);
    cm[i] ~ multinomial(pm[i]);
    L[i] ~ lkj_corr_cholesky(2.0);
    mu[i] ~ multi_normal(mu_p, mu_S);
  }
}
generated quantities {
  matrix[2,2] S[n_stim];

  for (i in 1:n_stim) {
    S[i] = multiply_lower_tri_self_transpose(L[i]);
  }
}

And here is Python code to generate data and fit it with pystan:

import numpy as np
import pystan as ps
import pickle
from glob import glob

signs = np.array([[1,1],[1,-1],[-1,1],[-1,-1]])

mu = np.array([[-1,-1],[-1,1],[1,-1],[1,1]])
R = np.array([[[1,.5],[.5,1]],
              [[1,-.5],[-.5,1]],
              [[1,-.5],[-.5,1]],
              [[1,.5],[.5,1]]])

N = 100

cm = np.zeros((4,4), dtype=int)

for i in range(4):
  y = np.random.multivariate_normal(mu[i], R[i], size=N)
  for j in range(4):
    cm[i,j] = np.sum( (signs[j,0]*y[:,0] < 0) & (signs[j,1]*y[:,1] < 0) )

data_dict = {'n_stim':4, 'n_resp':4, 'max_N':N, 'cm':cm, 'signs':signs}

pkl_files = glob('*pkl')
model_pkl = 'simple_grt_mod.pkl'

if model_pkl not in pkl_files:
  model = ps.StanModel(file='simple_grt.stan')

  with open(model_pkl,'wb') as f:
    pickle.dump(model,f)

else:
  
  with open(model_pkl,'rb') as f:
    model = pickle.load(f)

fit_model = True
if fit_model:
  n_iter = 5000
  warmup = 2500
  thin = 1
  fit = model.sampling(data=data_dict, iter=n_iter, warmup=warmup, thin=thin, n_jobs=2)

  fit_pkl = 'simple_grt_fit.pkl'
  smp_pkl = 'simple_grt_smp.pkl'

  with open(fit_pkl,'wb') as f:
    pickle.dump(fit,f)
  with open(smp_pkl,'wb') as f:
    pickle.dump(fit.extract(),f)

I think Rob Trangucci said he was going to add the bivariate case. There isn’t a good way to go higher than that which we know

The first problem with your model is here:

that should be declared as cholesky_factor_corr[2], not matrix[2, 2] if that’s what it is (seems to be given the prior).

This is also wrong as written, I think:

for (i in 1:n_stim)
  mu[i] ~ multi_normal(mu_p, mu_S);

What I think you want instead is just

mu ~ multi_normal(mu_p, mu_S);

outside of the loop. But you don’t need any of that because you have a diagonal covariance matrix, so you’re setting up all kinds of extra work factoring and taking derivatives tha aren’t necessary. Instead, you want the likelihood contribution to be

for (i in 1:n_stim)
  mu[i] ~ normal(mu_p[i], 2);

but even that’s inefficient compared to just flattening everything but that’s some trickier index, so I’d wait to optimize that.

If you really did want to define your own constant covariance matrix, you’d want to do it in the transformed data block so as not to incur a lot of extra autodiff overhead. If you did need to define a matrix, it could be done like this:

matrix[2, 2] A = [[2, 0], [0, 2]];

You need to move the definition of mu_p to transformed data for efficiency, and it can just be:

for (i in 1:n_stim) mu_p[i] = -signs[i];

You never need to multiply by negative one, and Stan will let you vectorize all this if you make signs an array of vectors. But then why not just define signs the way you want it on the outside rather than negating on the inside?

I’m sure there’s more, but that should get you started.

Thank you! This is very helpful. I will work on cleaning this model up (and reading more of the current Stan manual), and then I’ll probably be back with more questions later (I’m guessing).

Any idea when the bivariate normal CDF function will be ready to go?

No, but we can bug @rtrangucci. Seriously, though, knowing that people have a use for a function is one of our biggest motivators for implementing it.

There’s an example of how to implement the bivariate normal cdf directly in a Stan program that’s in the manual chapter on custom probability functions (chapter 23 as of now). I’ll repeat it here since I just found it in the manual:

    real binormal_cdf(real z1, real z2, real rho) {
      if (z1 != 0 || z2 != 0) {
        real denom = fabs(rho) < 1.0 ? sqrt((1 + rho) * (1 - rho)) : not_a_number();
        real a1 = (z2 / z1 - rho) / denom;
        real a2 = (z1 / z2 - rho) / denom;
        real product = z1 * z2;
        real delta = product < 0 || (product == 0 && (z1 + z2) < 0);
        return 0.5 * (Phi(z1) + Phi(z2) - delta) - owens_t(z1, a1) - owens_t(z2, a2);
      }
      return 0.25 + asin(rho) / (2 * pi());
    }

Inputs need to be standardized or the function needs to be adjusted to handle non-zero locations and non-unit scales.