How can I found the appropriate family to fit my data using brms?

Hello, everyone. I have a set of data, but I don’t know how to fit it with appropriate family

the results likes this, family=gaussian or family=Gamma can’t fit the data well.

my code is here

fit<-brm(formula=node_length~type*node_num2+(1|location),

  •      family = gaussian(link="identity"),
    
  •      data=data_ck_IAA,
    
  •      seed=1,
    
  •      prior=c(set_prior("",class="Intercept"),
    
  •              set_prior("",class="sigma")),
    
  •      chains=4,
    
  •      iter=2000,
    
  •      warmup=500,
    
  •      thin=1,
    
  •      control = list(adapt_delta=0.99,max_treedepth = 15,stepsize=0.001)
    
  • )

and my row data like this:

How can I fit my data?

Thank you guys!

Hi - I’d recommend you do prior predictive checks first and then build your model step by step.

Why do you set your priors to infinity? I think many of the problems with adapt_delta etc. can be due to no priors.

2 Likes

Is the problem the choice of family or the linear model? Have your tried something like a smooth term for example?

fit <- brm(node_length ~ type + s(node_num2, by=type, k=10) + (1|location), data=data_ck_IAA)

From your plot (which looks like a conditional_effects plot from brms) it looks like you are trying to fit a line for each “type” through data where “node_length” and “node_num2” are not related in that way.

2 Likes

Your problem is that you are modeling clearly nonlinear relationships between the response and node_num2 using linear terms. The issue isn’t primarily the family (though it looks like the data might not be homoskedastically Gaussian), but rather the form of the linear predictor. Try fitting a linear predictor that is quadratic in node_num2 or perhaps a linear predictor with still greater flexibility.

2 Likes

Thank you for your answer Mr. torkar.

I set the prior as no information distribution, because some books tell me, no information distribution is safe if I don’t know how to set it .

I will learn the prior predictive checks and try it.

Thank you very much.

Thanks for your answers Mr. jsocolar.

Just like you said, the I should use the nonlinear predictor,

but I don’t know how set that in brms, or in bayesian analysis.

can you give me some advice?

Thank you very much.

Thanks for your answer Mr. jd_c.

I tried your code and got a nice fitted curve like this.

This is my first time to use this code. +s(node_num2, by=type, k=10) , where can I learn about it?

and how can I explain this results?
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 3.05 0.21 2.63 3.51 1.00 1417 1557
typeIAA -0.03 0.01 -0.04 -0.01 1.00 4751 2166
snode_num2:typeCK_1 6.31 1.01 4.36 8.28 1.00 1917 2277
snode_num2:typeIAA_1 3.36 0.76 1.88 4.84 1.00 2079 2163

Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
shape 24.07 0.44 23.23 24.95 1.00 4276 2479

**I think snode_num2:typeCK_1 and snode_num2:typeIAA_1 mean the interaction effect terms, but in the linear model, just one interaction effect terms, that means there are interaction effect when comparing CK or IAA. **

How should I understand this resulut?

Thank you very much.

This is a smooth term. Specifically a penalized thin plate regression spline (if I remember correctly that is the default). You can learn a lot about splines from Gavin Simpson’s blog and this excellent post by Tristan Mahr about splines as they are implemented in brms.
The code that you ran fit a smooth term to each ‘type’.
The coefficients of the smooth terms are not very interpretable. See here and here.
You can make predictions for different values of ‘node_num2’ and different levels of ‘type’ and ‘location’. You can also estimate the first derivative of the spline via finite differences if you would like to find the slope of the spline, see here for the concept and here for what you will need from brms, posterior_smooths.

If this isn’t acceptable, then maybe you should think about the relationship between ‘node_length’ and ‘node_num2’ and come up with a more generative explanation that you could fit using the non-linear syntax in brms.

2 Likes

Mr. jd_c

This is a great answer, which is very helpful to me.

Thank you very much.

I will try them all.

1 Like