I’m working on a series of spatial models with BART components. I currently specify the models in PyMC, but I’ve also used Stan a fair bit. The current state of my PyMC model is detailed here.
I looked at stan4bart and think it could serve my purpose. My question is can it handle an ICAR prior? I reviewed some work by @mitzimorris and @imadmali from several years ago. My PyMC model is quite computationally/memory intensive. The ICAR specification in PyMC also doesn’t handle sparse matrices, which I need because I have a very large spatial adjacency matrix (~235,000 units). I think brms can do ICAR, but it doesn’t integrate with the BART portion.
According to the stan4bart paper, it may help with the convergence time to include variables in both the BART and non-BART components (as well as the pareto k issues detailed in my PyMC post). I was thinking a BYM2 model could work well. Perhaps, @cmcd has some ideas? I know how it would be done in Stan, but I don’t think you can access raw Stan with stan4bart.
The end goal is to get causal estimates for key continuous treatment variables using BART and adjust for spatial/non-spatial errors via random effects.
I don’t know what stan2bart does, but we can’t code BART in Stan because of the discrete parameters. PyMC or Pyro seem like better alternatives.
An alternative is to use Gaussian processes. Jennifer had a postdoc, Vince Doris, who did a lot of evaluation of this. In the end, I think Jennifer stuck with BART. I didn’t follow the work closely. See, e.g., https://files.eric.ed.gov/fulltext/ED591944.pdf
@mitzimorris is off teaching spatial modeling at the GeoMed conference, so it might take her a couple days to respond.
Thanks, @Bob_Carpenter. Stan4bart was written by Vince Doris. It seems to integrate with Stan through rstanarm to include multi-level intercepts/slopes. brms seems to handle CAR but not rstanarm from what I could find. I’ll like into Gaussian processes. I’d thought about that approach, but I wasn’t sure about the causal inference implications. I’ll read through the work you link to and see if it gives clarification.
I was thinking a BYM2 model could work well. Perhaps, @cmcd has some ideas?
sorry, I’m not so familiar with BART models. (Generally, whether the BYM model is a good choice depends mainly on data type and sometimes no. of observations per unit, for multi-levels with person-level observations)