I’m using {posterior} 1.6.0 and I’m seeing the cache for a column of rvars in a data.frame ballooning to something like 400G! The actual data is a few MB. Is there anyway to turn off caching? I tried setting the cache attribute to an empty env but that doesn’t fix it. BTW, I’m using posterior inside a function being run on {targets}. I’m using spread_rvar
to get my data and then I’m using inner_join
on that data.frame, if that explains things.
I check sizes using lobstr::obj_size()
I took a closer look at the column of rvars. I attached the cache environment.
> vec_proxy[[1]] |> str()
List of 3
$ index : int 1
$ nchains: int 4
$ draws : num [1:8000, 1:252] -0.0027 0.01855 -0.00226 -0.01987 0.04545 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:8000] "1" "2" "3" "4" ...
.. ..$ : NULL
> vec_proxy[[2]] |> str()
List of 3
$ index : int 2
$ nchains: int 4
$ draws : num [1:8000, 1:252] -0.0027 0.01855 -0.00226 -0.01987 0.04545 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:8000] "1" "2" "3" "4" ...
.. ..$ : NULL
> vec_proxy[[3]] |> str()
List of 3
$ index : int 3
$ nchains: int 4
$ draws : num [1:8000, 1:252] -0.0027 0.01855 -0.00226 -0.01987 0.04545 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:8000] "1" "2" "3" "4" ...
.. ..$ : NULL
This data frame has 252 rows each with an rvar that has 8000 samples. Notice that the vec_proxy
variable has the entire data repeated 252 times.