Hi, I’m looking for a multivariate normal CDF function for censored multivariate normal observations. I.e. with observations (x_1, \ldots, x_k) I have

(X_1,\ldots, X_k) \sim \text{multi_normal}(\mu, \Sigma)

and I want to calculate

P(X_1 < x_1\text{ and }\ldots X_k < x_k)

But I don’t see a multi_normal_lcdf in the docs. Is there a reason for that? I think I could write multi_normal_lcdf in terms of normal_lcdf (c.f. this stackoverflow answer for the bivariate case). Is that the best way? Thanks.

Hey @potash!

Unfortunately, multi_normal_lcdf doesn’t exist in Stan. I’m working (slowly) on getting a bivariate normal cdf and lcdf into the math library. The algorithm is more complex than the stack overflow answer you linked to, we’re implementing something like the algorithm presented in section 2.2 here

Higher dimensional (like d > 3 essentially) multi_normal_lcdf needs to be computed via MC integration. If you need something useful now I’d follow @bgoodri’s advice here:

Essentially, you’ll declare a parameter vector<lower=0,upper=1>[k] u[N] that you’ll be able to connect to your observations vector[k] x[N] with some clever inverse transform sampling laid out in that google groups post.


That technique is more commonly referred to as the GHK algorithm

Different G.


Got it. Thanks @rtrangucci and @bgoodri!

Hey @rtrangucci,

I need 16 dimensional normal CDF for my model. I’m very interested in the advice you linked ( That group chat is now private so I can’t see it. Could you please paste it here? Thanks.

The linked conversation was me asking for a truncated MVN, way back when. Ben provided the attached solutions.

tMVN.pdf (2.4 MB)
tMVN.stan (1.3 KB)

1 Like

Thanks, @aornugent.

I hate to necro an old thread, but I’ve been playing with Ben’s solution and while it works great to sample from a truncated normal, Ben suggests in the PDF you can use it to learn the parameters of the truncated normal, but that doesn’t work because you still need the normalizing constant of the truncation. You can prove this mathematically, but the easiest way to see it is to take the example and add priors to mu and Sigma, so that make_stuff specifies the conditional density of y | mu, Sigma. You’ll have to make the priors tight to prevent divergent transitions (I used K=2 with mu ~ N(0, 0.1), diag(Sigma) ~ N(1, 0.1), and assumed Sigma was diagonal). Then the marginal of mu is just its prior, but Stan will subtly sample from a slightly different distribution - in my examples you can only detect this with a large sample size, and it looks OK according to the naked eye, but a KS test will catch it.

Hi, @Matthew_Simpson, do you have the solution to correct for what you describe? Like math or stan code? Indeed when I sample from mu ~ std_normal(), the marginal of mu should be its prior but it’s not. I don’t observe any problem in the HMC diagnostics.

1 Like

There is no known solution @samlevy. You need the normalizing constant of the multivariate truncated normal as a function of mu and sigma, and this is simply not analytically tractable for 3+ dimensions. Or at least no one knows how to use NUTS/HMC in this case, AFAIK. If you manage to figure it out, you could have a nice publication - this would have applications to multinomial probit discrete choice models, for example.

What you can do is use NUTS/HMC for the latents conditional on the parameters (y|mu, sigma), and for the parameters conditional on the latents (mu, sigma | y), and construct a NUTS-within-Gibbs sampler in this way. But this is entering original research territory - e.g. I’m not sure how well the tuning algorithms Stan and other NUTS implementations use will work in the context of a Gibbs sampler, and I’m not sure if you can use Stan in this way.

1 Like

Maybe @spinkney has something to add to this last point by @Matthew_Simpson ?

I see. I was trying to use @bgoodri code to infer the mean and/or covariance of a truncated multivariate normal distribution. My ultimate goal was to use NUTS to sample from a constrained Gaussian process in which I write the covariance function as a joint kernel of the GP and its derivative GP, and imposing a certain number of constraints on the derivative GP part (to have a shape-constrained original GP) is akin to sampling from a large truncated multivariate normal. The parameters mu and the hyperparameters of the covariance function are unknown and should be inferred. But as far as I understand what you said, it’s not feasible, unless I place mu and cov as data and not as parameters.