Modeling factors with categorical levels

amini · October 26, 2019, 12:24pm

This is for hypothesis testing where my hypothesis states that the dependent variable( int values- normal distribution) has a higher value in the treatment condition.
I’m doing it both in brms & lme4 to do a comparative study between the two schools of the stat.
Here each participant undergoes 50 trails in the experiment condition he; she is assigned to. (again: It’s between-subject set up)
My regression model:

D.V ~ Condition + (1|Participant) + (1|Trial)

Should I use any coding scheme for my factor (Condition)
Should I consider Trail as a factor?

Gang · October 27, 2019, 1:41am

It’s up to you. By default, R uses dummy coding, but you can switch to, for example, deviation coding if you prefer

options(contrasts = c("contr.sum", "contr.poly"))

or your own coding method.

Without knowing the details of your data (e.g., number of trails), it’s hard to make specific recommendations. You may have to try different models and compare them through tools such as posterior predictive check.

amini · October 27, 2019, 6:43am

@Gang I am interested in seeing how the DV changes with time --so here it’s Trials. And Trials (no of trials- 50) is the only within-subject variable in this between subject experiment.
Also I am currently using trails as a factor---- as.factor(d$trails). Is this okey or should I use it as an int.That is as.intetger(d$trails).
And thank you for the previous input!

Gang · October 28, 2019, 2:41pm

Yes, trials should be treated as the levels of a factor in your implementation.

mike-lawrence · October 28, 2019, 3:08pm

Chiming in to comment on the (1|Trial) part. If you expect different levels of a variable to consistently behave differently from one another (ex. if you had lots of data at each level of that variable, you’d be confident in being able to discern differences in the mean of their outcomes), then the (1|my_variable_name) is one rather blunt approach to allowing the model to “see” that structure. For truly categorical variables (ex. participant), this is as far as you can go, but for numeric variables like Trial, you can explore a little deeper by maintaining the numeric information in the variable (i.e. don’t convert to factor) and modelling it with explicit functions. A linear effect of Trial would be achieved by simply +Trial, possibly with nuance like:

D.V ~ Condition + Trial + (1 + Trial  | Participant)

to express a model where Trial has a linear effect but participants manifest this effect with variability from one participant to the next. You could even add interaction with condition, participant variability in the manifestation of said interaciton, etc (maybe take a look at this explanation of the meaning/models behind the lme4/brms formula syntax: https://stats.stackexchange.com/questions/13166/rs-lmer-cheat-sheet/13173#13173). If Trial truly has a linear effect, modelling it explicitly this way will yield more accurate/powerful inference compared to the (1|Trial) approach (treating Trial as a “random effect”).

But what if you don’t feel comfortable assuming linearity in the effect of Trial? If you have a specific non-linear function in mind, obviously use that; if not, check out GAMs and GPs, which will find possibly-wiggly/possibly-linear functions that best reflect the effect of interest.

Topic		Replies	Views
Mixedeffects for between-subject design with multiple Trials? Modeling	3	484	October 8, 2019
Logistic Regression with Interactions (ContinuousXFactors(>2Levels)) brms	13	2112	March 13, 2019
Categorical factor coding in stan Modeling techniques , specification	6	5584	April 23, 2023
Between subject experiment (repeated measures ) with one IV with two levels, which statistical test? General	6	1114	January 24, 2020
Contrast coding with brms brms	3	2575	April 13, 2020

Modeling factors with categorical levels

Related topics