Model with two linked outcome variables

Hey everyone!

I would appreciate your help in a somewhat conceptual question.
I am planing to model the data of a (psychological) reinforcement learning experiment.
Participants can always choose between three options (left, middle or right), which either pay them nothing or a pre-specified reward (e.g. 5 points if you choose left and win, 10 points if you chose the middle and win etc.).
Through their choices they start to learn the wining-probabilities of each option.

Now, usually one would model the “choice value” of each option and derive the choice probabilities from that.
However, in each round we also asked our participants to estimate the wining probability of each option.
That means as outcome variables we have the participants’ probability estimates as well as their choices.
I would like to incorporate both of these datapoints into a model, and this is what I had in mind:

Use reinforcement learning to model their reported probability estimates Q estimating a learning rate \eta:

Q \sim RL(\eta)

In parallel I would fit their choices.
In the simplest case they would just calculate the expected value, EV, of each option by multiplying the wining-value of the option with the corresponding estimated wining-probability Q.
Using a softmax function I translate EV into actual choice probabilities (for now leaving out any sensitivity parameters for simplicity sake).
So their choice would be modeled as

X \sim \text{softmax}(EV)

Now it would be easy to just plop two target += statements (one for the modeled probability estimates and one for the modeled choices) in my model section in Stan and call it a day.
In my head this also makes sense, since adding the logs means multiplying the probability densities, which seems akin to asking the question “given these parameters, what is the probability of this winning estimate and this choice?”.
However, I’m not sure whether there is something I’m not seeing.
The two outcome variables are clearly linked, since the predicted choices would be based on the predicted probability estimates.
Would I have to link the two outcomes via a mixed outcome copula?
I must admit, I’d have to wrap my head around that topic first, which is why I’m asking.

I’m thankful for any comments or help!
Cheers

I’m afraid reinforcement learning is so general as to not mean a whole lot on its own. Is there a specific density you have in mind here?

Yes, that’s right, because

\log p(b) + \log p(a \mid b) = \log p(a, b).

The unknown unknowns are what get you :-).