The Stan manual says that ragged arrays are not allowed, but I also found this github tutorial on ragged arrays spec (https://github.com/stan-dev/stan/wiki/Ragged-array-spec). The github tutorial seems to be more latest than the manual; does this mean that ragged array is now allowed in Stan?

# Is Ragged Array allowed in Stan?

**ermeel**#3

The Stan 2.18. user guide states in section 9:

Stan does not directly support either sparse or ragged data structures, though both can be accommodated with some programming effort.

Check under assets here:

**jjramsey**#4

Thatâ€™s not a tutorial. Thatâ€™s a proposal for a possible *future* feature of Stan.

**serenejiang**#5

I see. No wonder I ran into a lot of syntax errors when trying to implementing those codes.

**sakrejda**#6

OTOH there are very good ways to implement ragged arrays in Stan. For example:

```
int n_total_entries;
vector[n_total_entries] flat_array;
int n_l1_entries;
int starts[n_l1_entries];
int stops[n_l1_entries];
int lengths[n_l1_entries];
```

Then to get the `k`

'th vector:

```
vector[lengths[k]] v = flat_array[starts[k]:stops[k]];
```

**serenejiang**#7

@sakrejda Thank you! Do you have suggestions for a ragged array of matrices, where each matrix has different size? I am very new to Stan programming; any suggestion or reference would be appreciated!

**sakrejda**#8

What you want to do with them in Stan and what are the constraints on their sizes? Thereâ€™s no general answer but there are really good specific answers.

**serenejiang**#9

@sakrejda I want to input into Stan an array of matrices of size [num_subjects, num_basis, num_visits]. In other words, there is a corresponding matrix for each subject, and each matrix has number of rows = number of basis,

and number of columns = number of visits. Note that the number of basis is the same for all subjects, but the number of visits is different. I record the number of visits for each subject in a vector called V. I am wondering whether I could extract the value from the vector V when loading the array of matrices as well. After loading such data, then I would perform MCMC for each subject based on their matrices of basis information. Is this clear enough?

**sakrejda**#10

Yes thatâ€™s clear. Thatâ€™s only ragged in one dimension so in Stan I would stack subjects and visits:

```
int num_subjects;
int num_visits[num_subjects];
int subject_starts[n_subjects]; // row of x where each subject starts
int subject_stops[n_subjects]; // row of x where each subject ends
int num_basis;
matrix[sum(num_visits), num_basis] X;
```

The rest depends on how you use X. If itâ€™s a model matrix then you have a pretty

standard (G)LM setup with X as the model matrix.

**wds15**#11

My preference by now is to not have a starts and stops vector, but rather have a single one which has `n+1`

elements whenever you have `n`

items. In each entry `i`

you have the starting position of the `i`

th element and the `i+1`

th element is one past the end of the `i`

th one (which happens to be the start of the `i+1`

th one). This way you keep things tidy with a single index vector instead of twoâ€¦ which makes it less cluttered for me.

**rsmall**#12

I have a similar problem in which I need to access a distance matrix \mathbf{D}_j for each family to define a correlation structure via Gaussian Process Regression: \mathbf{K}_j=\exp\left[-\rho\mathbf{D}_j\right] (element-wise). Any ideas as to how to go about passing/building each \mathbf{D}_j in Stan using clever indexing? Iâ€™m sorry, but I donâ€™t follow the examples given above.

I can build \mathbf{K}_j â€śon demandâ€ť within the model or transformed parameters block, but I donâ€™t want to have to continuously rebuild each \mathbf{D}_j - as I have ~10,000 families this will likely be very slow.

**bgoodri**#13

Need more info here. Are the distance matrices fixed and so \rho is the only unknown? Do you mean that you have an array of J distance matrices, like `matrix[T, T] D[J];`

?

**rsmall**#14

Yes, my distance matrices are fixed - I am trying to learn the unknown tuning parameter \rho. If I understand your question correctly, then my T would essentially an array of ints giving the dimensionality of \mathbf{D}_j, e.g. T=c(1, 1, 2, 5, 7). To put it another way, I have (in R) a list D in which D[[j]] accesses \mathbf{D}_j with size n_j \times n_j.

**bgoodri**#15

OK, I understand now. That is going to be annoying currently. I think the least annoying option would be to compute the maximum value of `T`

and declare the thing in data like `matrix[max(T), max(T)] D[J];`

and then in R put all the distances for the j-th family in the top left of the j-th matrix, leaving any unused cells as \infty.

Then in your model block, do something like

```
model {
for (j in 1:J) {
matrix[T[j], T[j]] K = exp(-rho * block(D[j], 1, 1, T[j], T[j]));
// likelihood contribution of the j-th family
}
}
```

**rsmall**#16

Makes sense - thanks! Since the `block`

command just selects the appropriate entries of the â€śpaddedâ€ť `D[j]`

I could just as easily define the unused cells as zero, say?

**bgoodri**#17

You could, but if you make a mistake, it will show up more destructively if you make the unused cells infinity.

Actually negative infinity would be better due to the `exp(-rho * ...)`

thing.

**rsmall**#18

Great idea, thanks. I assume that `block`

also works on arrays of integers as well?

**rsmall**#20

Thanks for your help on this, Ben. Just wanted to let you know that the model compiles and runs correctly.