Hi all,

I am trying to figure out the best way to code and perform inference on a linear regression model that includes four categorical variables: two groups (i.e., disease groups) and two conditions (i.e., on and off medication). More specifically, I am working with a reinforcement learning model where each RL model parameter is the dependent variable of this linear regression. I’m attempting to model the effect of each group and condition on the RL model parameters without splitting the data out into 4 group/condition combination-specific subsets.

There are 60 datasets corresponding to 30 participants, each of whom completed a task twice (once per medication condition); there are 18 participants in group 1 and 12 participants in group 2. My current draft of this model uses dummy coding and I’m estimating varying slopes for each group/condition combination. The columns of the variable ‘ICD_Med_idx’ correspond to (i) group1/on Meds (slope captured by beta1), (ii) group1/off Meds (beta2), (iii) group2/on Meds (beta3), and (iv) group2/off Meds (beta4).

My main questions are:

- Is this a valid way of modeling the influence of each group/condition on the RL model parameters?

- Embedding the linear regressions within each RL model parameter’s non-centered parameterization is stretching the limits of my understanding of the best practices for coding this model.

- My intuition about performing inference on the beta coefficients (separately for each RL model parameter) would be:

- contrast (beta1-beta2) to get the effect of medication state within group1
- contrast (beta3-beta4) to get the effect of medication state within group2
- contrast (beta1-beta3) to get the effect of diagnosis while on medication
- contrast (beta3-beta4) to get the effect of diagnosis while off medication

Is this intuition on-target? or is there a more proper way to perform these contrasts?

- What would be the best way of contrasting the beta coefficients to get the total effect of medication (i.e., ‘ignoring’ diagnosis) or of diagnosis (i.e., ‘ignoring’ medication state)?

- Here, my intuition about isolating the effect of medication state across groups (for each RL parameter separately) would be (beta1+beta2)-(beta3+beta4), and isolating the effect of group diagnosis across medication states would be (beta1+beta3)-(beta2+beta4).

- Is there a more proper way to code the categorical variables to model their interactions?

Fitting this model as-is gives fine chain convergence/effective sample size of all model parameters (via 3 chains with 12k samples each (2k warmup)). I’d greatly appreciate any clarification on something I’ve missed/gotten wrong in the model specification or the inferential intuition, or confirmation that this modeling approach makes sense.

```
data {
int<lower=1> N;
int<lower=1> T_max;
int<lower=1, upper=T_max> T_subjs[N];
int<lower=-1, upper=8> option1[N, T_max];
int<lower=-1, upper=8> option2[N, T_max];
int<lower=-1, upper=2> choice[N, T_max];
real outcome[N, T_max];
int<lower=0, upper=1> ICD_Med_idx[N,4];
}
transformed data {
row_vector[3] initV;
initV = rep_row_vector(0.0, 3);
}
parameters {
vector[3] mu_pr;
vector<lower=0>[3] sigma_pr;
vector[N] learnrate_pr;
vector[N] discount_pr;
vector[N] tau_pr;
vector[3] beta1;
vector[3] beta2;
vector[3] beta3;
vector[3] beta4;
}
transformed parameters {
// subject-level parameters
vector<lower=0, upper=1>[N] learnrate;
vector<lower=0, upper=1>[N] discount;
vector<lower=0, upper=20>[N] tau;
for (i in 1:N) {
learnrate[i] = Phi_approx(mu_pr[1] + beta1[1]*ICD_Med_idx[i,1] + beta2[1]*ICD_Med_idx[i,2] + beta3[1]*ICD_Med_idx[i,3] + beta4[1]*ICD_Med_idx[i,4] + sigma_pr[1]*learnrate_pr[i]);
discount[i] = Phi_approx(mu_pr[2] + beta1[2]*ICD_Med_idx[i,1] + beta2[2]*ICD_Med_idx[i,2] + beta3[2]*ICD_Med_idx[i,3] + beta4[2]*ICD_Med_idx[i,4] + sigma_pr[2]*discount_pr[i]);
tau[i] = Phi_approx(mu_pr[3] + beta1[3]*ICD_Med_idx[i,1] + beta2[3]*ICD_Med_idx[i,2] + beta3[3]*ICD_Med_idx[i,4] + beta4[3]*ICD_Med_idx[i,4] + sigma_pr[3]*tau_pr[i])*20;
}
}
model {
mu_pr ~ normal(0,1);
sigma_pr ~ normal(0,1);
learnrate_pr ~ normal(0,1);
discount_pr ~ normal(0,1);
tau_pr ~ normal(0,1);
to_vector(beta1) ~ std_normal();
to_vector(beta2) ~ std_normal();
to_vector(beta3) ~ std_normal();
to_vector(beta4) ~ std_normal();
// subject loop and trial loop
for (i in 1:N) {
matrix[6,3] ev;
int decision;
real PE;
vector[2] option_values = [ 0, 0 ]';
for (idx in 1:6) { ev[idx] = initV; }
for (t in 1:T_subjs[i]) {
decision = (choice[i, t] > 1) ? option2[i, t] : option1[i, t];
option_values = [ ev[option1[i, t], 1] , ev[option2[i, t], 1] ]';
// compute action probabilities
target += categorical_lpmf(choice[i, t] | softmax( option_values*tau[i] ));
for (e in 1:3) {
if (e < 3) {
PE = discount[i] * ev[decision, e+1] - ev[decision, e];
ev[decision, e] += learnrate[i] * PE;
} else {
PE = outcome[i, t] - ev[decision, e];
ev[decision, e] += learnrate[i] * PE;
}
}
}
}
}
```