Modeling factors with categorical levels

This is for hypothesis testing where my hypothesis states that the dependent variable( int values- normal distribution) has a higher value in the treatment condition.
I’m doing it both in brms & lme4 to do a comparative study between the two schools of the stat.
Here each participant undergoes 50 trails in the experiment condition he; she is assigned to. (again: It’s between-subject set up)
My regression model:


D.V ~ Condition + (1|Participant) + (1|Trial)


  1. Should I use any coding scheme for my factor (Condition)
  2. Should I consider Trail as a factor?
1 Like

It’s up to you. By default, R uses dummy coding, but you can switch to, for example, deviation coding if you prefer

options(contrasts = c("contr.sum", "contr.poly"))

or your own coding method.

Without knowing the details of your data (e.g., number of trails), it’s hard to make specific recommendations. You may have to try different models and compare them through tools such as posterior predictive check.

2 Likes

@Gang I am interested in seeing how the DV changes with time --so here it’s Trials. And Trials (no of trials- 50) is the only within-subject variable in this between subject experiment.
Also I am currently using trails as a factor---- as.factor(d$trails). Is this okey or should I use it as an int.That is as.intetger(d$trails).
And thank you for the previous input!

Yes, trials should be treated as the levels of a factor in your implementation.

Chiming in to comment on the (1|Trial) part. If you expect different levels of a variable to consistently behave differently from one another (ex. if you had lots of data at each level of that variable, you’d be confident in being able to discern differences in the mean of their outcomes), then the (1|my_variable_name) is one rather blunt approach to allowing the model to “see” that structure. For truly categorical variables (ex. participant), this is as far as you can go, but for numeric variables like Trial, you can explore a little deeper by maintaining the numeric information in the variable (i.e. don’t convert to factor) and modelling it with explicit functions. A linear effect of Trial would be achieved by simply +Trial, possibly with nuance like:

D.V ~ Condition + Trial + (1 + Trial  | Participant)

to express a model where Trial has a linear effect but participants manifest this effect with variability from one participant to the next. You could even add interaction with condition, participant variability in the manifestation of said interaciton, etc (maybe take a look at this explanation of the meaning/models behind the lme4/brms formula syntax: https://stats.stackexchange.com/questions/13166/rs-lmer-cheat-sheet/13173#13173). If Trial truly has a linear effect, modelling it explicitly this way will yield more accurate/powerful inference compared to the (1|Trial) approach (treating Trial as a “random effect”).

But what if you don’t feel comfortable assuming linearity in the effect of Trial? If you have a specific non-linear function in mind, obviously use that; if not, check out GAMs and GPs, which will find possibly-wiggly/possibly-linear functions that best reflect the effect of interest.

2 Likes