Model not recovering simulated parameters

I have been futzing around with a model for a while using simulated data, and I cannot recovered the exact parameters I set using the model. However, I am recovering the correct differences. I am not sure if I am messing up the model, or if I am doing everything correctly and my expectations for this model was incorrect.

Essentially, I want to use a model to determine how a behavior changes with the addition of a treatment. The behavioral response is a continuous measurement between 0 and 1 (or 0.99 to avoid certain math errors with a logit link), which I’ve simulated using the beta2 function in rethinking. In the absence of a treatment, the behavioral response should center around .4. With the addition of a treatment, the response should shift up by .1 (low quality treatments) or .3 (high quality treatments). This is how I’ve simulated the data in R:

#create df with 4 treatments, 6 replicates per treatment
treat <- c("A", "B", "C", "D")
ID <- c(1:6)
df <- expand_grid(treat, ID)

#expand each trial for 10 rows, representing 10 measurements per replicate
df <- df %>% 
  slice(rep(1:n(), 10)) 

#simulate behavioral response in absence of treatment
#simulating from beta to only get values between 0 and 1
df$absent <- rbeta2(n=240,  .4 , 10)

#Simulate behavioral response with treatment
df <- df %>%
mutate(m_s = case_when(
  treat== "A" ~ .3,
  treat == "B" ~ .3,
  treat == "C" ~ .1,
  treat == "D" ~ .1)
)
df$present <- rbeta2(n=240, (.4 + df$m_s), 10)

#Replace "1" with ".99" in absent and present to avoid math errors
df$absent[df$absent == 1] <- .99
df$present[df$present == 1] <- .99

Then I tried to recover the m_s values using this model:

parameters{
    real a;
    vector[4] t;
    real<lower=0> scale;
}
model{
    real mu;
    vector[240] mu2;
    scale ~ gamma( 15/1 , 1/1 );
    s ~ normal( 1 , 1 );
    a ~ normal( -0.4 , 0.5 );
    for ( i in 1:240 ) {
        mu2[i] = a + s[species[i]];
        mu2[i] = inv_logit(mu2[i]);
    }
    mu = a;
    mu = inv_logit(mu);
    present ~ beta( mu2*scale , (1-mu2)*scale );
    absent ~ beta( mu*scale , (1-mu)*scale );
}
generated quantities{
    vector[240] log_lik;
    real mu;
    vector[240] mu2;
    for ( i in 1:240 ) {
        mu2[i] = a + t[treat[i]];
        mu2[i] = inv_logit(mu2[i]);
    }
    mu = a;
    mu = inv_logit(mu);
    for ( i in 1:240 ) log_lik[i] = beta_lpdf( absent[i] | mu*scale , (1-mu)*scale );
}

However, every time I run this the output values for t, which should correspond to my m_s values when I simulated the data, are off by ~.4 (after accounting for the logit link). The difference between treatments is the same, ~.2. I think my models estimates of t do not separate out the intercept, a, and I have no idea how to get the model to do so. Or even if it’s supposed to do so–I always assumed that models would estimate t given intercept, a, but maybe I’m wrong? I’m concerned if I can’t figure this out, I won’t understand my results when I apply the model to real data where I don’t know the true parameters. Does anyone know what’s going on here? Is the model the problem, or my expectations of the model?

1 Like

It may help to simplify the Stan code. You introduce the mean parameters mu and mu2 in the model block. And then you recalculate those parameters in the generated quantities block. It’s easier to define mu and mu2 in the transformed parameters block instead.

And I don’t think it’s true that

The m_s parameters are defined on the original scale in the interval (0, 1):

df$present <- rbeta2(n=240, (.4 + df$m_s), 10)  # The second param in rbeta2 is between 0 and 1.

While the t parameters are defined on the link (logit) scale:

for ( i in 1:240 ) {
    mu2[i] = a + t[treat[i]];
    mu2[i] = inv_logit(mu2[i]);
}

It’s easier to recover the m_s parameters on the original scale in the generated quantities block: m_s = mu2 - mu. I’m not sure you can simplify m_s = inv_logit(a + t) - inv_logit(a) into a function of t only.

data{
  int N;
  vector<lower=0, upper=1>[N] present;
  vector<lower=0, upper=1>[N] absent;
  array[N] int treat;
}
parameters{
    real a;
    vector[4] t;
    real<lower=0> scale;
}
transformed parameters{
    real<lower=0, upper=1> mu;
    vector<lower=0, upper=1>[4] mu2;
    mu = inv_logit( a );
    mu2 = inv_logit( a + t );
}
model{
    // priors
    scale ~ gamma( 15/1 , 1/1 );
    t ~ normal( 1 , 1 );
    a ~ normal( -0.4 , 0.5 );
    // likelihood
    present ~ beta( mu2[treat]*scale , (1-mu2[treat])*scale );
    absent ~ beta( mu*scale , (1-mu)*scale );
}
generated quantities{
  vector[4] m_s;
  m_s = mu2 - mu;
}