Non-centered parametrization with irregularly missing data

This is more or less a conceptual question — I will write it out using math instead of Stan code.

I have a multilevel model where group-level y_i \in \mathbb{R}^n are modeled as a multivariate normal distribution. The twist is that for each y_i, some elements are parameters and some are data, depending on i. Estimating a centered parametrization is straightforward, but I run into convergence issues.

I am trying to convert to a non-centered parametrization. Let L be a Cholesky correlation factor, and \sigma a vector of standard deviations. Then

y_i = L \cdot \mathrm{Diag}(\sigma) \cdot \varepsilon_i

would allow me to go back and forth between standard normal IID \varepsilon's and y. But given that some elements of each y_i are data, I am constrained in the elements of \varepsilon's.

This is how I conceptualize this. Let

A = L \cdot \mathrm{Diag}(\sigma)

for easier notation. Assume that y_i and \varepsilon_i can be partitioned as parameters and data, so

\begin{bmatrix} y_{i,p} \\ y_{i,d} \end{bmatrix} = \begin{bmatrix} A_{pp} & A_{pd} \\ A_{dp} & A_{dd} \end{bmatrix} \begin{bmatrix} \varepsilon_{i,p} \\ \varepsilon_{i,d} \end{bmatrix}

Then I parametrize in terms of \varepsilon_p for each i, and have

\varepsilon_{i,d} = A^{-1}_{dd}(y_{i,d} - A_{dp} \varepsilon_{i,p})

I calculate the log likelihood based on [\varepsilon_{i,p}, \varepsilon_{i,d}] as being standard normals, and my log Jacobian determinant correction is

\log |\det(A_{dp})| - \log|\det(A_{dd})|

which thankfully is the same for all i.

All I have to deal with is a heinous amount of bookeeping. I am asking for feedback on whether the above would be the recommended approach, or I missed an obvious clean solution.

I always have an issue with the group/level/factor terminology, so please excuse me if I botch the rest of this due to a misunderstanding. Is this just what we’d call a missing data problem or is y[n] a parameter in something else? If it’s just missing data, I don’t think you need to worry about centered/non-centered and I’d recommend just promoting the observations and proceeding as described in the User’s Guide.

There’s an issue of how to code the missingness in Stan, but sticking to math, the idea is that if you have a vector y[1:K] and some of the values are missing, the missing values get replaced with parameters and the non-missing values get promoted to parameters.

If y_missing[n] are parameters and y_observed[n] is data then you can just add them together to get the y[n] vector and proceed as usual (assuming they have zeroes in places that aren’t observed). Or you can be more careful and actually interleave them, which will be more efficient, but can be painful to code.

What I think you’re suggesting is a way to project the covariance matrix to the dimensions where we have observations, sample those marginally, then sample the parameters conditional on the data values? That should work, but I’m not sure it’s going to be worth it, especially if it’s inducing determinant calculations on an observation-by-observation basis.