Rvars - distinguishing between a scalar and array of size 1

it appears that the rvar interface in posterior cannot distinguish between a scalar and array of size 1, i.e.

scalar <- draws_matrix("y" = 1:10)
array1 <-  draws_matrix("y[1]" = 1:10)
all.equal(scalar, array1) # Correctly unequal
# [1] "Attributes: < Component “dimnames”: Component “variable”: 1 string mismatch >"
all.equal(as_draws_rvars(scalar), as_draws_rvars(array1)) # Incorrectly equal
# [1] TRUE
all.equal(as_draws_matrix(as_draws_rvars(array1)), array1) # Incorrectly unequal
# [1] "Attributes: < Component “dimnames”: Component “variable”: 1 string mismatch >"

Is this the desired/expected behaviour? Or is this a bug?

Tagging @mjskay and @jonah, thanks for any ideas.


Hmmm good question. This comes down to the semantics of rvars, which are intended to mimic base R vectors, which also don’t have a distinction between a scalar and a vector of length 1 (so far as I am aware).

So on the one hand, we should probably want as_draws_X(as_draws_Y(x)) to equal x if x is of type X. On the other hand, I’m not sure if losing the analogy between rvars and base vectors/arrays is worth that change. I would be curious what pros/cons people see. Is the inconsistency @martinmodrak found a weird corner case that doesn’t matter much in practice or is it a bigger problem?

Whatever the decision obviously this should be better documented.


The context: I am working on the SBC package, were we thought that draws_rvars would be a good format to store both prior and posterior draws. However I need to convert each draw from the prior into data that can be passed to Stan as a list. Since the Stan input data does distinguish between a scalar and array of size 1, I need to keep that distinction. So draws_rvars are almost in the right format to create a Stan dataset, except for this little quirk. So I can work around this by swapping to draws_matrix as the internal format and writing my own conversion function (that could probably just use as_draws_rvars for the array formatting and then use the draws_matrix value to decide between scalars/vectors of size 1).

So it kind of is a weird corner case, but also not completely irrelevant?

Maybe there could be an optional attribute for an rvar to mark it as scalar? (not sure how much problems this would create for all the rvar operations).


Hmm. One behavior that base R vectors do that I did not mimic in rvars because it creates many corner cases in code is that vectors do not require a dim attribute (i.e. dim(x) can be NULL), whereas rvars will always return a numeric vector for dim (sometimes it’s just the length of the vector). I did this because otherwise a bunch of code has to check for NULL dims and use length instead, which at the time did not seem useful.

That said, one way to distinguish between a “scalar” in base R and an array of size 1 is if dim(x) == NULL or dim(x) == 1. So one option would be to have the conversion functions recognize this convention. It will be slightly annoying to fix since we’d have to check all uses of dim() on rvars in the code that assume dim() is guaranteed non-null, and other functions that assume the internal array always has at least two dimensions (since currently this is also guaranteed). This would also mean the value of draws_of() returned to a user would not be guaranteed to have at least two dimensions. I’m also not sure if this would make some other operations on rvars kind of annoying to use if they are returning values that the user would expect to be treated as scalar when converted to other formats but which are actually 1-element arrays.

The other option would be a more focused solution that just uses an attribute, as you suggest. It might be an attribute for array-ness rather than scalar-ness; then 1-element arrays that get converted to rvars from another format could retain their array-ness through this attribute. Would be curious what folks think. (I’m not sure either solution sounds great to me at the moment, at least from the perspective of implementing it :) )

1 Like