Interpret parameters after inv_logit transform in hierarchical non-centered parameterization


I’m fairly new to Bayesian modeling in general and Stan in particular so I apologize if I even use incorrect phrasing of what I’m trying to ask…!
I’m working on a hierarchical model of reinforcement learning in a repeated measures design. I’ve posted the model code (close to my latest version) in this post.
In the given experiment a number of subjects nS performed a learning task under nC different (drug) conditions (i.e. each subject repeats the learning paradigm in nC times). The reinforcement model then uses a set of parameters to model the subjects behavior in the learning task. Of interest for my current question is the parameter Arew that models the learning rate. It needs to be constrained to a range of [0,1].
To account for the hierarchical structure (subject repeated in condition) each parameter has a non-centered parameterization. The baseline condition is parameterized as follows

Arew_normal[s,] = Arew_m + Arew_cond_s * Arew_cond_raw[s,] + Arew_vars[s,1]; 

That is, there is a group mean Arew_m, a subject specific offset Arew_cond_raw[s,] with sd Arew_cond_s and a random part for each subject that is correlated across conditions.
Each non-baseline condition (that is, each drug condition) has an additional offset Arew_cond_grp_m plus a subject specific random part:

Arew_normal[s,v] += cond_vars[s,v,kk] * (Arew_cond_grp_m[kk] + Arew_vars[s,kk+1]);

where cond_vars[s,v,kk] is a dummy ccoding the condition.
Now for the reinforcement model the parameter Arew needs to be in the range [0,1]. This is achieved by an inv_logit() transform:

Arew[s,] = inv_logit(Arew_normal[s,]);

That means, that I can interpret the subject-wise parameters Arew on my [0,1] scale. However, the estimates for the group-level parameters for the overall mean of the parameter (Arew_m) and the group-level parameter for the condition effect (Arew_cond_grp_m) are only available on the unconstrained space before the transform happens.

What would be the way to interpret these parameters?
That is, how can I tell (and report) what effect my drug manipulation has on my parameter Arew?

I’m extending here a model that is used in the hBayesDM package. This simple model, that does not implement repeated measures, uses a non-centered parameterization for the learning rate parameter as follows (using Phi_approx() instead of inv_logit() to transform the parameter to a range of [0,1]):

A[s] = Phi_approx(A_m + sigma * A_raw[s])

To get the group-level parameter transformed back to an interpretable scale it then uses

mu_A   = Phi_approx(A_m);

in the generated quantities block.
Is there a way to obtain interpretable values for Arew_m and Arew_cond_grp_m using such kind of back-transformation? I’d assume

mu_Arew = inv_logit(Arew_m)
mu_Arew_cond_grp = inv_logit(Arew_cond_grp_m)

to be misleading here since the parameter Arew comprises of a sum of the 2 parameters.

Any idea on this or a suggestion how to better deal with such a case is highly appreciated!
Thanks a lot!

1 Like

sorry for not getting to you earlier, the question is relevant and well written. I admit I didn’t try to understand the whole model, but hope I can provide some hints nevertheless.

If the Arew can be interpreted as a probability or something sufficiently similar, than coefficents before the inv_logit transform are changes in log-odds (or you can exponentiate them and get odds-ratio). This is very commonly done for logistic regression, so you should be able to find a lot of examples around this.

If that’s not enough or if the Arew coefficient further interacts with other parts of the model and you need to account for those interactions, than I think it is most useful to try to interpret the model via its predictions. Looking at individual coefficients can be seen as a special case of such a prediction. If I regress outcome ~ treatment (where treatment has levels A and B) the coefficent for treatmentB represents how big difference between averages of outcome for the two groups I would see in a hypothetical new set of measurements with no noise/unlimited data.

So for a complicated model I can make predictions for the two treatments and then subtract them to get posterior samples of the difference. This usually forces me to be abit more precise about the question I am asking: do I want to predict using the non-treatment covariates of the participants I observed? Or some special covariate values? Should I take the fitted random effects for the participants I observed or do I want to predict for a hypothetical new participant (by drawing a new random effect from the fitted hyperprior), etc. I think this is actually an advantage.

Best of luck with your model and feel free to ask for clarifications!