Memory retention case study: modeling full individual differences

Hi All,

Initially looking at the nature / quality and level of the discussions, I was very hesitant to post my questions as I am newbie to stan. I would like to thank the stan community for posting quick responses to doubts being posted, this is of immense help.

I am currently working though the case studies in bayesian cognitive modeling book, specifically the memory retention chapter. I am trying to rework the https://github.com/stan-dev/example-models/blob/master/Bayesian_Cognitive_Modeling/CaseStudies/MemoryRetention/Retention_2_Stan.R example. The first stan code, wherein, we treat all participants having the same information decay rate and baseline decay rate, goes through without any issue and I am able to infer from the model. However, when I introduce person level paramters,the model doesn’t converge at all.

There were 9721 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmupExamine the pairs() plot to diagnose sampling problems
The largest R-hat is 1.07, indicating chains have not mixed.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#r-hatBulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#bulk-essTail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#tail-ess

I see a huge number of divergent transitions. If the number had been much smaller, I would have tried to play around with control parameters like tree depth or adapt_delta. However, will the serious divergences I am noticing, could there be some serious issues with the code? Setting a higher adapt_delta doesn’t help with reducing the number of transitions either. Please advice.

I cannot remove the divergent transitions.

In your code, the parameters of a vector alpha and beta can take subscript ns.
However, in the model block, the alpha[ns] and beta[ns] do not apper.
I feel it is strange, … .
Thus I tried to remove the parameters alpha[ns] and beta[ns] which are not described in the model block.

I am a beginner of Stan and Bayesian Inference, so, please stay until an expert answers.

parameters {
  vector<lower=0,upper=1>[ns] alpha;
  vector<lower=0,upper=1>[ns] beta;
} 
transformed parameters {
  matrix<lower=0,upper=1>[ns,nt] theta;
  
  // Retention Rate At Each Lag For Each Subject Decays Exponentially
  for (i in 1:ns)
    for (j in 1:nt)
      theta[i,j] <- fmin(1.0, exp(-alpha[i] * t[j]) + beta[i]);
}
model {
  // Priors
  alpha ~ beta(1, 1);  // can be removed
  beta ~ beta(1, 1);  // can be removed
  
  // Observed Data
  for (i in 1:(ns - 1))
    for (j in 1:(nt - 1))
      k[i,j] ~ binomial(n, theta[i,j]);
}

My attempt

rm(list=ls()) 

library(rstan)

model <-rstan::stan_model( model_code= "
// Retention With Full Individual Differences
data { 
  int ns;
  int nt;
  int k[ns - 1,nt - 1];  // excluding missing values
  int t[nt];
  int n;
}
parameters {
  vector<lower=0,upper=1>[ns-1] alpha;
  vector<lower=0,upper=1>[ns-1] beta;
} 
transformed parameters {
  matrix<lower=0,upper=1>[ns-1,nt-1] theta;
  
  // Retention Rate At Each Lag For Each Subject Decays Exponentially
  for (i in 1:ns-1)
    for (j in 1:nt-1)
      theta[i,j] = fmin(1.0, exp(-alpha[i] * t[j]) + beta[i]);
}
model {
  // Priors
  alpha ~ beta(1, 1);  // can be removed
  beta ~ beta(1, 1);  // can be removed
  
  // Observed Data
  for (i in 1:(ns - 1))
    for (j in 1:(nt - 1))
      k[i,j] ~ binomial(n, theta[i,j]);
}
generated quantities {
   // int<lower=0,upper=n> predk[ns,nt];
  
  // Predicted Data
  //  for (i in 1:ns)
   //   for (j in 1:nt)
     //   predk[i,j] = binomial_rng(n, theta[i,j]);
}"

)

t     <- c(1, 2, 4, 7, 12, 21, 35, 59, 99, 200)
nt    <- length(t)
slist <- 1:4
ns    <- length(slist)

k1 <- matrix(c(18, 18, 16, 13, 9, 6, 4, 4, 4, NA,
               17, 13,  9,  6, 4, 4, 4, 4, 4, NA,
               14, 10,  6,  4, 4, 4, 4, 4, 4, NA,
               NA, NA, NA, NA,NA,NA,NA,NA,NA, NA), nrow=ns, ncol=nt, byrow=T)

k <- k1[1:(ns - 1), 1:(nt - 1)]   # Excluding NAs (for Stan solution)

n <- 18

data <- list(k=k, n=n, t=t, ns=ns, nt=nt) # To be passed on to Stan

# myinits <- list(
#   list(alpha=rep(.5, ns-1), beta=rep(.1, ns-1)))

# parameters <- c("alpha", "beta", "predk")  # Parameters to be monitored

# For a detailed description type "?rstan".
samples <- sampling(object=model,   
                data=data, 
                # init=myinits,  # If not specified, gives random inits
                # pars=parameters,
                iter=20000, 
                chains=1, 
                thin=1,
                # warmup = 100,  # Stands for burn-in; Default = iter/2
                # seed = 123  # Setting seed; Default is random seed
)

result

Warning messages:
1: There were 9732 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup 
2: Examine the pairs() plot to diagnose sampling problems
 
3: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#bulk-ess 
4: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#tail-ess