Indicator variables very slow

nelsjohnson · August 23, 2021, 3:50pm

Greetings.

I have binary response data y, covariates x1, x2, x3, and a strata variable. I want to fit quadratic models, but not necessarily the same for all strata. First, I ran the full model:

brms_out <- brm(bf(y~ 0 + stratum + stratum:(x1+x2+x3)+stratum:(I(x1)^2)+I(x2)^2)+I(x3)^2),
family=bernoulli),
data=dat, inits = "0", chains = 4, iter = 2000, warmup = 1000,thin = 1, cores = 4,
                control = list(max_treedepth = 12), sample_prior = "yes")

This takes 12 hours to run. I did some Bayes Factors to select which quadratic effects to keep, and I reran the model using indicator variables. Here is the code:

dat$stratum1 <- factor(1*(dat$stratum=="s1"))
dat$stratum2 <- factor(1*(dat$stratum=="s2"))
dat$stratum3 <- factor(1*(dat$stratum=="s3"))

brms_out <- brm(bf(y~ 0 + stratum1 + stratum2 + stratum3 +
stratum1:(x1+x2+x3) +stratum2:(x1+x2+x3) + stratum3:(x1+x2+x3) +
stratum1:(I(x2)^2))+stratum2:(I(x3)^2)+stratum3:(I(x1)^2)+I(x2)^2)+I(x3)^2),
family=bernoulli),
data=dat, inits = "0", chains = 4, iter = 2000, warmup = 1000,thin = 1, cores = 4,
                control = list(max_treedepth = 12), sample_prior = "yes")

However, this code has now been running 12 hours and the progress is still at 1/2000 iterations for all chains.

So my question is why did the indicators slow the code down so much? How can I speed it up so it finished in about 12 hours again?

Operating System: Win 10
brms Version: 2.12.0

nelsjohnson · August 24, 2021, 5:26pm

Update: I ran the code with the indicator variables as numeric instead of factor and this fixed the speed issue. However, it destroys a lot of the nice functionality in brms for making summaries since all the functions think the indicators are numeric. I am still looking for advice on what to do here.

edit because I marked this as solution: the factor version adds unnecessary columns and parameters, whereas the numeric is the correct model.

martinmodrak · September 7, 2021, 7:38am

I would suspect that the issue here is that the interactions didn’t do what you expected - since brms uses model.matrix under the hood for the fixed terms, I think it would be useful to see how the result of model.matrix changes between the factor and numeric indicator variables…

Using Bayes factors to filter predictors is almost certainly a problematic approach that can lead to overconfident inferences - for basically the same reason that stepwise variable selection with p-values is problematic.

nelsjohnson · September 7, 2021, 6:31pm

Thanks for your replies.

You are correct, the model.matrix function gives different datasets. The one with factors adds unnecessary columns of zeros which Stan then assigns parameters to. This means I should use the code with the numeric indicators.

I am aware. If you have a better suggestion on how to do model selection in Stan when the candidate set of models is of size 3^14 with a runtime on the order of 12 hours per model, I am all ears.

martinmodrak · September 7, 2021, 7:23pm

If that’s the case I would probably challenge whether model selection can be useful/relevant for your scientific/real world question :-) If all of that huge model space is a-priori plausible, it seems unlikely that even extremely large dataset can give you a reliable answer. And if you have enough data to learn something, why is it not sufficient for you to just examine the posterior of the full model and see if there is evidence that some of the coefficients are close to zero (i.e. in the ROPE = region of practical equivalence)?

nelsjohnson · September 7, 2021, 8:08pm

Challenge on what grounds? I’ve explained nothing to you about the goals of this project beyond what is sufficient to address a software question.

“Seems unlikely”? Unless you have a citation or something concrete this isn’t helpful, it’s just you nitpicking from a position of next to no information about this project.

Because nobody knows what the ROPE is. And I find it hard to believe ROPE doesn’t fall prey to the exact same issues selecting on BF does.

Next time, please consider being curious and asking questions before offering unsolicited advice. This exchange has not been helpful and was more of an aggravation. I expect more from stan developers.

stijn · September 8, 2021, 4:33am

Trying to get this on the right path.

Model selection between two models already requires sufficient data. Model selection between 3^14 models definitely is going to require a lot of information in the data. Hence,

The general advice by Andrew Gelman is than often to estimate the full model as long as that is computationally possible. See for instance this blog post and the references in the post. Hence,

If your question requires you to make a decision about the model complexity this paper by Dan Simpson et al. is something I think is underused. The basic idea is to have a parameter in the full model for each added layer of complexity (as defined by theory) and put a prior on that parameter that penalises complexity. The final step is then to decide what your cut off value is for those complexity parameters to be small. You could call that ROPE or you could just communicate to your audience what a low value is in your context. Hence,

Final comment, @martinmodrak is the nicest and most helpful person on this forum. It’s fine to be frustrated but a bit more charitable reading of Martin’s answer and understanding that there is some common knowledge in the community (always search Gelman’s blog) that is not always made explicit, would have given you the same information I have given you.

martinmodrak · September 8, 2021, 12:53pm

I admit I didn’t choose my words carefully. I didn’t intend to be snarky or show contempt for your approach, but understand how my posts could be read in that way. My response came from an honest wish to help you make the best inferences you can from your data and I am sorry that this was not clear. As @stijn noted, I am just summarising some informal knowledge that’s circulating in the community, I don’t have citations and I can’t say I am completely certain I am correct. All of the points were meant as suggestions for you to further reflect on the problem and you are obviously free to disagree or ignore them.

I also agree that one of the responses was inappropriate and violates our Code of Conduct and have deleted it as well as the OPs response to it (which was not problematic, it just doesn’t make sense to me to keep it after the problematic post was deleted).

nelsjohnson · September 8, 2021, 5:32pm

I understood you were well intended and just trying to help, and I do not hold any ill feelings. It just felt a little more judgmental than I was in the mood for dealing with. It was a mistake on my part to not simply let the advice be and move on. For that, I apologize.

Thank you for deleting the violating posts.

Topic		Replies	Views
Categorical multi-level model performance brms	2	689	July 8, 2019
Speed up brm function brms	4	3892	September 10, 2018
First time using brms, stuck in the null model! brms fitting-issues	4	1377	December 9, 2018
Brm running time Modeling fitting-issues	7	564	May 5, 2020
Brms seems to omit option "cores" while fitting model brms fitting-issues , paralellization	1	1063	April 29, 2023

Indicator variables very slow

Related topics