Can Stan handle Nans? Built in Size Constraints?

bkaplowitz · October 3, 2018, 9:16pm

Hi, two quick questions. First, can Stan handle NAN’s? If not is there a preferred way to deal with them? I am having trouble initializing the sampling when I use 0’s in place of NAN, and I am not sure the correct way to handle them to avoid accidentally messing up the log likelihood.

The relevant portion of my model looks like the following:

model {
    lambda ~ gamma(.001,.001);
    for (t in 1:T_max) {
        P[t,1:M_max]~multi_normal((S[t]*exp(r[t,1:J_max])')',T);
     }
}

lambda here is used in the construction of T, which is a M_max by M_max covariance matrix that has diagonal entries only, each element of S[t] represents a M_max by J_max matrix, and when P[t,1:M_max] has a NAN entry, S[t] has NAN entries in the corresponding spots (r is always real-valued.)

My second question is, unfortunately, the matrix S ends up containing 295350000 entries. When I run this as a Stan model this allocates 60GB to RAM. If I try and use multiple cores, I get a pickling error with ‘i’ type exceeding some threshold that I think is due to this array being too large. If I can replace NAN entries with 0, I can use a sparse matrix and that will cut down on memory usage since a lot of entries are NAN, but barring that solution is there any way to get around this size constraint?

bgoodri · October 3, 2018, 9:24pm

If you mean, can Stan handle the situation where some elements of a vector or matrix are known and others are missing, then the answer is “yes”, although it is pretty tedious. See the chapter on missing data in the Stan User Manual. You should probably try to get this to work on a smaller (subset of a big) problem before trying to tackle the memory issues.

bkaplowitz · October 3, 2018, 9:29pm

Okay, thanks. I guess what I’m really wondering is would I be messing up the sampling procedure or any of the post-estimation checks by just dropping the missing observation ahead of time, rather than using the procedure for missing/partially observed data.

bgoodri · October 4, 2018, 12:43am

Dropping observations is not going to mess up the sampling, but it does mess up the posterior distribution unless they were missing completely at random.

Topic		Replies	Views
How to deal with NaN for the outcome variable/dependent variables Modeling techniques , fitting-issues , specification	5	554	March 28, 2023
Constraints during sampling General rstan , techniques , fitting-issues , specification , constraint-transform	5	918	November 2, 2020
High dimension of multivariate normal -- general question and machine precision issue Modeling	1	564	July 26, 2019
Speeding up multivariate normal model Modeling techniques	7	3156	September 9, 2017
R memory-conservation strategies with Stan Modeling specification , performance	4	578	February 28, 2021

Can Stan handle Nans? Built in Size Constraints?

Related topics