Omit transformed parameters from output?

skolenik · November 5, 2019, 4:19pm

I am reproducing the tutorial rstanarm multilevel example rewriting it in “pure” ®Stan with some twists. I wanted to produce the mean predictions per school, and after about fifteen trials and errors, I ended with the combination of transformed parameters and generated quantities as follows:

//
// This Stan program defines a multilevel model
// expressed in lmer as
// 
// lmer(formula = course ~ 1 + female + (1 | school), data = GCSE)
//

// The input data
data {
  // number of observations
  int<lower=0> N;
  // number of schools
  int<lower=0> M;
  //  outcome variable: course
  vector[N] y;
  // predictor: female
  vector[N] female;
  // cluster variable
  int school[N];
  // school sizes, to be computed in R
  int s[M];
}

parameters {
  // intercept
  real beta0;
  // fixed effects slope
  real beta_f;
  // random effects
  vector[M] u_j;
  // residual error
  real<lower=0> sigma;
  // random effect variance
  real<lower=0> sigma_u;
}

transformed parameters {
  // level-1 unit specific means
  vector[N] mu_i;
  
  mu_i = beta0 + beta_f * female + u_j[school];
}

model {
  y ~ normal(mu_i, sigma);
  u_j ~ normal(0, sigma_u);
}

generated quantities {
  // running counter  
  int pos;

  // school empirical mean
  vector[M] m_j;

  pos = 1;
  for (j in 1:M) {
    m_j[j] = mean( segment(mu_i,pos,s[j]) );
    pos = pos + s[j];
  }
}

The R code that gets the data is

### pure Rstan multilevel example
library(tidyverse)
library(rstan)
library(here)
library(mlmRev)

data(Gcsemv, package = "mlmRev")
summary(Gcsemv)

# massage the data a bit
Gcsemv$female <- relevel(Gcsemv$gender, "M")
Gcsemv %>% 
  select(school, student, female, course) %>% 
  filter( !is.na(course) ) %>% 
  arrange(school, student) %>%               # sort by school and student within school
  group_by(school) %>% 
  mutate(s=n()) %>% ungroup() ->             # compute school sizes
  GCSE_nmy

# convert to lists for Stan
GCSE_stan_data <- list(
  # number of observations
  N = nrow(GCSE_nmy),
  # number of schools
  M = length(unique(GCSE_nmy$school)),
  #  outcome variable: course
  y = GCSE_nmy$course,
  # predictor: female
  female = as.integer(GCSE_nmy$female)-1,
  # cluster variable
  school = as.integer(GCSE_nmy$school),
  # school sizes for segmenting
  s = (GCSE_nmy %>% group_by(school) %>% slice(1) %>% ungroup() %>% select(s) %>% unlist() )
)

So when I run

multilevel_stan_fit <- stan(
  file = here('multilevel-stan.stan'), # program
  data = GCSE_stan_data,             # data
  chains = 4,             # number of Markov chains
  warmup = 2000,          # number of warmup iterations per chain
  iter = 10000,           # total number of iterations per chain
  cores = 4,              # number of cores to use
  refresh = 200,          # show progress every 'refresh' iterations
  seed = 10203
)

I end up with

> print( object.size(multilevel_stan_fit), units="Mb")
576.1 Mb

which takes a fairly long while to create and access.

The blame, of course, is with mu_i, an object of size 1725 x 4000, which I don’t really care that much about. Is there any way to omit this vector of transformed parameters from being written into R? (It has to be there in memory for the generated quantities to be created, but the write step of outputting it is a different one, as far as I understand from https://mc-stan.org/docs/2_21/reference-manual/output.html.)

mcol · November 5, 2019, 4:58pm

You can use the pars option along with the include option to list the names of parameters you want to keep (if include=TRUE) or remove (if include=FALSE).

mike-lawrence · November 5, 2019, 5:19pm

I’ll also note that unless you’re trying to get stable inference about some pretty extreme quantiles in the posterior distributions, having 1e4 iterations per chain isn’t really helping you much.

skolenik · November 5, 2019, 6:11pm

OK so @mcol’s suggestion amounted to

multilevel_stan_fit <- stan(
  file = here('multilevel-stan.stan'), # program
  data = GCSE_stan_data,             # data
  chains = 4,             # number of Markov chains
  warmup = 2000,          # number of warmup iterations per chain
  iter = 10000,           # total number of iterations per chain
  cores = 4,              # number of cores to use
  refresh = 200,          # show progress every 'refresh' iterations
  seed = 10203,
  include = FALSE,        # actually, this means 'exclude' the parameters listed below
  pars = "mu_i"
)

which produces

> names(extract(multilevel_stan_fit))
[1] "beta0"   "beta_f"  "u_j"     "sigma"   "sigma_u" "pos"     "m_j"     "lp__"   
> print(object.size(multilevel_stan_fit), units="Mb")
48.7 Mb

Much nicer. Problem solved, case closed.

@mike-lawrence, old BUGS habits die hard :).

mathDR · May 8, 2024, 2:29pm

Has this behavior changed? What is the process to only include specific parameters in the writing of the files?

Topic		Replies	Views
Transformed parameters omitted in output if input parameters omitted? RStan	2	490	November 6, 2019
Are transformed parameters included in the search space for optimizations? Algorithms optimization	1	651	January 7, 2019
Multiple Regression in Rstan with factors Modeling	12	3118	February 6, 2018
Help with understanding my results Modeling fitting-issues , specification , performance	5	1184	February 8, 2019
How remove transformed parameters draws in cmdstanr CmdStan	2	390	August 25, 2023

Omit transformed parameters from output?

Related topics