Reparameterization ("Matt trick") failing to produce a speed improvement in hierarchical model

techniques

#1

Hi all. I’m trying to reparameterize my model in order to speed it up, as reported in Betancourt & Girolami (2015), the technique often referred to as “Matt’s trick”.

I’m failing to notice any speed improvement in my reparameterization, so I am wondering whether I have applied it correctly. In both cases, I get something in the order of:

1000 transitions using 10 leapfrog steps per transition would take 15.41 seconds.

If anything, after reparameterization that speed quote seems to approximately double, rather than decrease as I’d expect.

The original, centered parameterization, code, before I reparameterized, is:

parameters {
  ////////////////////
  //GROUP LEVEL
  real subj_mu[3];
  real<lower=0> subj_sigma[3];
  
  ////////////////////
  //SUBJECT LEVEL
  real alpha_pr[N];
  real k_pr[N];
  real tau_pr[N];
}

transformed parameters {
  real <lower=0,upper=1> alpha[N];
  real<lower=0> k[N];
  real<lower=0> tau[N];
  
  alpha = inv_logit(alpha_pr);
  k = exp(k_pr);
  tau = exp(tau_pr);
}

model {
  real exp_val[N,max(cue),NUM_CHOICES];
  real pred_err;
  real outcome;
  
  ////////////////////
  //GROUP LEVEL
  
  //priors for mean of subject params
  subj_mu[1] ~ normal(-3,3);
  subj_mu[2] ~ normal(log(.5),1);
  subj_mu[3] ~ normal(log(.5),0.5);
  
  subj_sigma[1] ~ cauchy(0,5); 
  subj_sigma[2] ~ cauchy(0,3); 
  subj_sigma[3] ~ cauchy(0,2);
  
  
  ////////////////////
  //SUBJECT LEVEL
  exp_val = rep_array(0,N,max(cue),NUM_CHOICES);
  
  alpha_pr ~ normal(subj_mu[1],subj_sigma[1]);
  k_pr ~ normal(subj_mu[2],subj_sigma[2]);
  tau_pr ~ normal(subj_mu[3],subj_sigma[3]);
  
  for (t in 1:NUM_TRIALS){//loop through timesteps.
    for(j in 1:NUM_CHOICES){
      #reinforcement learning
      pred_err=choice_outcomes[t,j]-exp_val[trial_runid[t],cue[t],j]; 
      exp_val[trial_runid[t],cue[t],j] = exp_val[trial_runid[t],cue[t],j] + alpha[trial_runid[t]]*pred_err;
    }
    #user-defined linear ballistic accumulator function based on that written by Annis, Miller, & Palmeri (2017)
    response_time[t] ~ lba(response[t],k[trial_runid[t]],A,to_vector(exp_val[trial_runid[t],cue[t],]),s,tau[trial_runid[t]]);
    
  }
  
    
}

My reparameterization is:

parameters {
  ////////////////////
  //GROUP LEVEL
  real subj_mu[3];
  real<lower=0> subj_sigma[3];
  
  ////////////////////
  //SUBJECT LEVEL
  real alpha_pr_var[N];
  real k_pr_var[N];
  real tau_pr_var[N];
}

transformed parameters {
  
  real alpha_pr[N];
  real k_pr[N];
  real tau_pr[N];
  
  real<lower=0,upper=1> alpha[N];
  real<lower=0> k[N];
  real<lower=0> tau[N];
  
  for (r in 1:N){
    alpha_pr[r] = subj_sigma[1]*  alpha_pr_var[r] + subj_mu[1];
    k_pr[r] =     subj_sigma[2]*  k_pr_var[r]     + subj_mu[2];
    tau_pr[r] =   subj_sigma[3]*  tau_pr_var[r]   + subj_mu[3];
  }
  
  alpha = inv_logit(alpha_pr);
  k = exp(k_pr);
  tau = exp(tau_pr);
}

model {
  real exp_val[N,max(cue),NUM_CHOICES];
  real pred_err;
  real outcome;

  ////////////////////
  //GROUP LEVEL
  
  //priors for mean of subject params
  subj_mu[1] ~ normal(-3,3);
  subj_mu[2] ~ normal(log(.5),1);
  subj_mu[3] ~ normal(log(.5),0.5);
  
  //priors for deviation of subject params from their mean.
  subj_sigma[1] ~ cauchy(0,5); 
  //these have lower prior SDs because our priors for them originally, from Palmeri et al., were lower.
  subj_sigma[2] ~ cauchy(0,3); 
  subj_sigma[3] ~ cauchy(0,2);
  
  
  ////////////////////
  //SUBJECT LEVEL
  exp_val = rep_array(0,N,max(cue),NUM_CHOICES);
  
  alpha_pr_var ~ normal(0,1);
  k_pr_var ~ normal(0,1);
  tau_pr_var ~ normal(0,1);

  for (t in 1:NUM_TRIALS){//loop through timesteps.
    for(j in 1:NUM_CHOICES){
      #reinforcement learning
      pred_err=choice_outcomes[t,j]-exp_val[trial_runid[t],cue[t],j]; 
      exp_val[trial_runid[t],cue[t],j] = exp_val[trial_runid[t],cue[t],j] + alpha[trial_runid[t]]*pred_err;
    }
    #user-defined linear ballistic accumulator function based on that written by Annis, Miller, & Palmeri (2017)
    response_time[t] ~ lba(response[t],k[trial_runid[t]],A,to_vector(exp_val[trial_runid[t],cue[t],]),s,tau[trial_runid[t]]);
  }
}

Any ideas?


#2

A couple of things:

  1. The speed report from sampling is one shot, and pretty noisy, so it’s possible that the speed change isn’t that much. Check out this post if you want more precise measures.
  2. You can vectorize the assignment in the transformed parameter block which may help.
  3. The speed improvement from the transformed parameter should be measured in terms of effective samples per second, not in terms of the time it takes to evaluate a single leapfrog step. If your steps are slower, but results in an increased step size and fewer leapfrog steps, you’ll still be coming out ahead.

#3

To emphasize – the performance warning is simply a way to quantify the speed of the gradient which will not be appreciably different before and after, and the only metric that matters is effective sample size per time and that matters only if there are no diagnostic problems such as divergences. See https://betanalpha.github.io/assets/case_studies/rstan_workflow.html and https://betanalpha.github.io/assets/case_studies/divergences_and_bias.html for further discussion.

The improvement in effective sample size per time is reported in the Betancourt and Girolami paper.