Syntax for including a predictor when a dummy variable =1

mattwilliamson13 · August 30, 2017, 5:13pm

I am trying to fit a model estimating the effects of fire on the occurrence of a noxious weed (burn). I am also interested in the effect of time since fire on occurrence (tburn). tburn only makes sense when burn = 1 (because there is no time since fire if burn = 0). I have been trying to fit this model as an interaction
y ~ binomial_logit(nsamp, X * beta + beta_b * burn + beta_tb*burn*tburn + alpha_point[PtID])

However, it seems to me that this creates a situation where the outcome of that interaction yields low values when a site has never been burned and when a site has been recently burned. Ecologically these are almost polar opposite cases and so I’m wondering what the correct syntax is for estimating beta_tb only when burn = 1. I think this is similar to using step() (and thus could use step() in Stan as well) in JAGS, but I don’t have enough experience with that language to know whether that is actually the right way to approach the problem (I have seen posts suggesting that this creates problems in the posterior). Any advice would be most appreciated.
Thanks in advance,
M

dlakelan · August 30, 2017, 6:10pm

Perhaps what you mean is that time since last burn is a very large number when it’s “never” been burned (one imagines that in the Jurassic age or whatnot, it probably burned… but you dont’ know how far back that really is)

I think what you want is a different model, one which you imagine the effect of a burn is transiently to cause one type of thing, and then through time to asymptote to another steady state… when tburn is small, you’re in the transient region, and when tburn is large, or “infinite” (ie. unknown but at least larger than your historical data includes) you’re in the steady state.

mattwilliamson13 · August 30, 2017, 6:20pm

Thank you!
I think that makes sense. To do that I would set the value for tburn at something considerably higher than my data currently covers and then use a dummy variable of burn=1 for burned sites and burn=2 for unburned sites. This would retain the time since burned relationship for the burned sites and simply increase the time since burned values for unburned sites. Does this match what you’re suggesting?

kholsinger · August 30, 2017, 6:20pm

I assume that beta represents non-fire covariates that influence occurrence that don’t depend on burn status.
Here’s one way you might approach it.

Put the data from sites where you’ve observed a burn in y and the data from sites where you haven’t observed a burn in z

y ~ binomial_logit(nsamp, X * beta + beta_tb*tburn + alpha_point[PtID])
z ~ binomial_logit(nsamp, X * beta + alpha_point[PtID])

This requires the (strong) assumption that the magnitudes of beta and alpha_point[] do not depend on whether or not a site has burned, but if you’re willing to make that assumption, estimates of beta_tb depend only on those cases where you have observed a burn.

Kent

dlakelan · August 30, 2017, 6:27pm

No, that’s not it. Just assume everything burned, and put the model into the time since burned. Then for areas where you haven’t observed the burn, you need to impute some time, but transform the effect through a function like 1-exp(-t/t0) or exp(-t/t0) or whatever’s appropriate, so that when t ~ 0 (ie. just burned) things are changing rapidly, but when t gets large, they asymptote to a fixed effect. In this sense, what you impute doesn’t matter so much, because steady state. So you could just assign a large number, say t0 * 10 rather than make a parameter.

Your choice of transient behavior will matter, and so you might want to think about what the general transient looks like, does it stay constant at “just burned” for a few years, and then decay to steady state? or does it start decaying right away? try to think of a function that has the general shape you want.

mattwilliamson13 · August 30, 2017, 6:38pm

I’m not entirely sure I understand. I understand eliminating the burn dummy variable. Are you suggesting treating time since burn as a variable with missing data and having Stan impute that data based on a distribution? Or are you suggesting treating tburn as a parameter that should be modeled as a function of other covariates? Or, more simply, just assign the each unburned site a value (say 1000) to ensure that I’ve pushed it beyond the asymptote?

Lastly can you clarify what t0 is?

Thanks for your help!!

dlakelan · August 30, 2017, 6:51pm

If you have prior information about the actual time to last burn even if you haven’t observed it, and if you think the effect of the burn may be not necessarily fully in the asymptote of the function for some unobserved sites, then I’d recommend creating a parameter and assigning a prior, and having Stan impute a time.

On the other hand, if your observations extend back far enough that you’re pretty sure the longest ones are in the asymptote, and the unobserved ones are even farther back, then just assign a number to the unobserved ones that is bigger than the longest observed one as an approximation, since whatever stan does for the imputed parameter won’t really affect anything because the function you’ll use is constant in that range of time anyway.

In my example t/t0 is a ratio of two times, and its size describes how far out into the asymptote you are. You shouldn’t transform a time variable directly, transform a dimensionless ratio, and choose t0 from scientific principles. You could make t0 be a parameter and have a very informative prior, or you could just choose t0 to be something like the mean observed recurrence time of fires in this region.

in the end you want to choose some function f(t/t0,a,b,c) where a,b,c may be parameters that describe the shape, and then do something like

y ~ binomial_logit(nsamp,… + f(t/t0,a,b,c) + …)

for the “unburned” things you’ll use t ~ 10 * t0 or the like.
for the recently burned you’ll use t ~ 0

mattwilliamson13 · August 30, 2017, 7:19pm

This is great. Thank you very much!

Topic		Replies	Views
Syntax errors with bernoulli_logit Modeling specification	3	383	April 5, 2020
How to incorporate a indicator variable on a hierarchical model? Modeling rstan , specification	0	368	February 21, 2022
Outcome variable also goes into predictor Modeling techniques	2	599	July 5, 2017
Regressing observed rates on self-reported frequencies measured in two stages Modeling techniques , specification	3	169	May 1, 2024
Multivariate model with subsetting, getting posterior predictions in brms brms brms	5	110	August 12, 2024

Syntax for including a predictor when a dummy variable =1

Related topics