Using estimated variables as index

We are trying to fit a model which requires using estimated variables as indices. We fit our model to ratings of different stimuli where the rating for each stimulus depends on the inferred value of the stimulus and some noise (sigma_U). As stimuli are very similar, we include a degree of uncertainty about which stimulus was actually seen. This uncertainty is known and defined as a categorical distribution over all stimuli, with highest probability for the stimulus that was actually shown. The inferred value on a given trial depends on the stimulus the participant believes he/she is seeing (pidx) and not the one shown (sidx).

From the STAN documentation and forum we understand that “parameters or transformed parameters cannot be integer or integer array” and therefore resorted to approximate the categorical distribution with a truncated normal distribution, N(sidx, sds).
The sampled value however needs to be used to index the stimulus value, for which we would have to transform the sampled parameter to an integer. According to the forum and manual this is not possible for parameters and in general not advisable.

To allow for indexing we use if else statements as a workaround to approximate an integer i.e. if estimated parameter >1.5 and <2.5 then select the value corresponding to stimulus 2. This allows us to run the model but results in serious fitting issues. Is there another way to implement this?

Here a minimal example

data {
    int<lower=1> nStim; // number of stimuli
    int<lower=1> nTrial; // number of trials
    real<lower=0,upper=1> rating[nTrial]; // participant ratings
    int<lower=1, upper=9> sidx[nTrial]; //stimulus shown; 
    real sds; //approximated SD of normal distr.
}


parameters {
  real<lower=0> sigma_U; //updating noise 
  real<lower=0> sigma_C; //choice noise
  real<lower=0, upper=1> value[nTrial, nStim]; // estimated value for each trial based on inferred stimulus
  real<lower=1,upper=9> pidx[nTrial]; // stimulus participant thinks he/she sees 
  real<lower=0> vPred[nTrial]; //predicted rating on each trial 
}


model {
        for (n in 1:nTrial-1) {
            pidx[n] ~ normal(sidx[n],sds) T[1,9]; // draw from possible stimuli
            if (pidx[n]<= 1.5) {
		value[n, 1] ~ normal((value[n, 1], sigma_U)
                vPred[n] ~ normal(value[n,1],sigma_C)
            } else if (pidx[n] <= 2.5) {
                value[n, 2] ~ normal((value[n, 2], sigma_U)
		vPred[n] ~ normal(value[n,2],sigma_C)
		… // continue for all stimuli 
            }
          }
  } 

1 Like

Bob Carpenter has provided a workaround for promoting reals to integers:

functions {
int real_to_int(p) {
 int k;
 k = 1;
 while ((k+1) < floor(p)) {
  k = k+1;
 }
 return k
 }
}

But as stated in the thread, this may cause incorrect gradient calculations (the reason Stan doesn’t promote reals to integers), impacting sampling and run times

And, although I don’t know how data only containers restrict user-defined functions, you can use to_int inside of user-defined functions as long as the container type is declared as data only.

How many possible stimuli are there?

There are 9 possible stimuli i.e nStim = 9

1 Like

With 9 possible stimuli, you should estimate this model by marginalizing over the possible stimuli for each observation.