I just came across this thread. Let’s say the experiment is that a person sees a series of values (e.g. 5, 7, 10, 3,…) and after every example they tell me what they think the mean of this series is. So if I made Stan code for a ‘Bayesian observer’ model I would put:

```
parameters{
real mu;
real<lower=0> stdev;
}
model{
mu ~ normal(0,10)
stdev ~ normal (0,10)
data ~ normal(mu,stdev);
}
```

And to get at the internal state of the observer after each sample (i.e. their estimate of mu and stdev), one could run that model again and again for e.g. sample 1 only, sample 1+2 , sample 1+2+3 etc. (or one could make the parameter mu to have as many entries as samples seen and also change the model code appropriately):

```
parameters{
real mu[nsamp];
real<lower=0> stdev[nsamp];
}
model{
mu ~ normal(0,10)
stdev ~ normal (0,10)
for (imaxSamp in 1:nsamp){
for (isamp in 1:imaxSamp){
data[isamp] ~ normal(mu[imaxSamp],stdev[imaxSamp]);
}
}
```

In contrast another common model (‘Reinforcement learning’ model) would express the learning/ combining the information about all the samples as:

```
mu[1] =0.5
for (isamp in 2:nsamp){
mu[isamp] = mu[isam-1] + learning_rate*(data[isamp] - mu[isamp])
```

So the conceptual difference between the models here is that learning_rate does not scale with the number of samples that have already been observed, whereas in the other model, later samples will produce a smaller shift in the estimated mu than earlier samples.

For the ‘Reinforcement learning model’ I would know how to fit it in Stan: learning_rate and some kind of ‘noise’ in the rating ability would be the only free parameter:

```
data{
real data[nsamp]
real participant_rating[nsamp]
}
parameters{
real<lower=0, upper=1> learning_rate
real rating_noise // assuming that participants have some noise in how they report their true belief
}
model{
real mu[nsamp]
mu[1] =0.5
for (isamp in 2:nsamp){
mu[isamp] = mu[isamp-1] + learning_rate*(data[isamp] - mu[isamp])
}
// assuming that participants
participant_rating = normal(mu,rating_noise)
```

But how would I fit the ‘Bayesian observer model’? I was wondering what would be actually a parameter that could differ between participants that produce something similar to the learning_rate? And I think that would be the prior belief about stdev (just focusing on this as an example, could also be the other parameters). Here is an attempt at a model but I think it’s not right (basically I don’t think i should have ‘stdev’ as parameter? and I’m not sure that ‘participant_rating’ is in the right place?):

```
data{
real data[nsamp]
real participant_rating[nsamp]
}
parameters{
real<lower=0> stdev[nsamp];
real<lower=0> stdev_prior;
}
model{
stdev ~ normal (0,stdev_prior)
for (imaxSamp in 1:nsamp){
for (isamp in 1:imaxSamp){
data[isamp] ~ normal(participant_rating[isamp],stdev[imaxSamp]);
}
}
}
```