Hi everyone,
As will probably be quite obvious from this post, I’m very new to Stan and still wrapping my head around a lot of Bayesian modelling concepts, so please forgive me if my question is better suited to a more general stats forum.
Essentially, in my field of research, we use a lot of intermittent experience sampling probes to measure people’s states of attention throughout different tasks. These probes are usually categorical, forcing people to select one of N possible options from a list that best represents their state of focus just prior (e.g. “Yes, I was focused on the task”, “No, my mind had wandered from the task inadvertently”, “No, I deliberately decided to think about something else”, “No, I was distracted by something else around me”). We’re usually interested in how different variables (some categorical, some continuous) affect how people respond to these thought probes, and typically take the approach of calculating the proportions of each response type for each participant (e.g. 50% on-task, 20% distracted, etc.) and running different frequentist tests on them that way. Of course, this type of approach ignores the fact the proportions of all the different types of response must sum to 1 and that an increase in the probability of one response type will have an effect on the probability of others, so I’m turning to Stan and Bayesian modelling to try and come up with a more holistic and sane approach to probe response analysis that models what we know about the data better.
Based on different tutorials and examples, I’ve pieced together a simple model that I think does this properly, but it works a little differently from many of the example models I’ve seen and I want to make sure I’m not doing something patently wrong or bad so far. Here’s my current model code:
data {
int<lower=0> N; // number of trials
int<lower=1> K; // number of categories
int<lower=1,upper=2> cond[N]; // condition identifier
int<lower=1,upper=3> resp[N];
}
parameters {
simplex[K] resp_p[2];
}
model {
for (i in 1:2) {
resp_p[i] ~ dirichlet(rep_vector(1, K));
}
for (i in 1:N) {
resp[i] ~ categorical(resp_p[cond[i]]);
}
}
Basically, for this specific data, there are two between-subjects conditions and 3 different possible responses that people can make. To model the effect of the condition on the likelihood of the possible response categories, I make a simplex for each condition and then model them separately based on the condition of each trial, giving me two probability distributions for each response that I can then subtract in R and get the differences for. Although this seems to work fine, I realize I’m not actually modelling the effects of condition on response types in my model: just modelling the parameter values for each condition separately and then comparing them. However, I’m not really sure how I could model the differences in response proportions directly given the I’d have to account for the fact that an increase in one response type would mean a decrease in the proportions in the others. Is what I’m currently doing valid? If not, what would a better approach here be?
Also, I’ve been trying to figure out how I might go about implementing a mixed-effects model for data of this type, i.e. accounting for both within-subject variance and between-subject variance in the model. I’ve looked over the helpful “Hierarchical Partial Pooling for Repeated Binary Trials” tutorial from Bob Carpenter several times, but it cautions that low observation counts per participant can create major problems with hierarchical pooling in models for binary outcomes, and in the case of these experiments we often only end up with 8-20 thought probes per participant (since you end up altering people’s attention during a task if you query them too frequently). Does this mean that hierarchical models might be a worse fit for this type of data than completely pooled ones, or are there just certain things I need to be careful about if I’m going to account for within-subject variance?
Thanks in advance for any advice you have to offer! I’m really excited to delve deeper into Stan.
Best,
- Austin