Dealing with ragged and partial data

Hey all,

I’m trying to build a model that has both ragged and partial structure to the data and having trouble vectorizing. So my data looks like matrix[K, J] data where each row has some of the columns filled in but not others. The tricky thing here (that doesn’t seem to be described in the manual) is that each row can have a different number of columns filled out; so the first row might have data for columns 2 and 7 and the 2nd row could have 2, 3, 5, 8 and I would like to select only the columns that have data filled out.

I think there might be a few programming tricks from other languages that would make this possible, but I’m not sure if we can do any of them in Stan:

  1. Construct a boolean index array (a la R, numpy…) where FALSE indicates not to include that index and TRUE means to include it in the resulting vector.
  2. Dynamically resizable arrays.

Anyone know if there are ways to do these things or if there are other ways to handle this situation?


Since this sounds like a sparse matrix, I’d just use three lists for something like this.

values, k_indices, j_indices

Something like that. So I guess the csr_extract_* functions could be used for this?

What are you trying to vectorize?

You can just use the “melted” form for most operations. See Chapter 16 of the Stan Manual on Sparse and Ragged Data Structures. The very first example is of a matrix with a different number of elements in each row.