Partial pooling in generated quantities?

Hi,

I want to setup partial pooling in the generated quantities blog for two normal models. I can’t do it in the model itself or rather don’t want to because it would make a small computational convenience to speed up the model by a lot impossible. My question is whether this approach actually makes sense and/or whether there are better alternatives. The model below works in terms of giving the right result but there might be a better way.


a <- rnorm(1000, 2, 10)
b <- rnorm(1000, 4, 20)
c <- (1/var(a) * a+ 1/var(b) * b)/(1/var(a) + 1/var(b))
mean(c)
sd(c)
(1/10^2 * 2 + 1/20^2 * 4)/(1/10^2 + 1/20^2)
sqrt(1/(1/10^2 + 1/20^2))
pp_m <- cmdstan_model("code/R/computation/partial_pooling/partial_pooling_test.stan")
data <- list(
  N = 100000,
  y1 = rnorm(100000, 2, 10),
  y2 = rnorm(100000, 4, 20)
)
fit <- pp_m$sample(
  data = data,
  seed = 123,
  chains = 4,
  parallel_chains = 4,
  refresh = 500
)
fit$summary("y_pooled")
data {
  int N;
  vector[N] y1;
  vector[N] y2;
}
parameters {
  real mu1;
  real mu2;
  real<lower=0> sigma1;
  real<lower=0> sigma2;
}
model {
  sigma1 ~ normal(0, 20);
  sigma2 ~ normal(0, 20);
  mu1 ~ normal(0, 10);
  mu2 ~ normal(0, 10);
  y1 ~ normal(mu1, sigma1);
  y2 ~ normal(mu2, sigma2);
}
generated quantities {
  real y_pooled;
  y_pooled = normal_rng((1/square(sigma1) * mu1 + 1/square(sigma2) * mu2)/(1/square(sigma1) + 1/square(sigma2)),
    sqrt(1/(1/square(sigma1) + 1/square(sigma2))));
}

What is y_pooled as a quantity intended to be? The mean if you made the model hierarchical such that mu1 and mu2 as samples from a normal?

Yes, that’s one interpretation. More specifically, mu1 and mu2 are two forecasts that I want to produce a weighted average of.

I’m not very math-saavy, so I’ll assume your formulae accurately reflect the analytic solution to the expectation for the mean and variance of the weighted mean. In that case, I’d simply mention that I think this approach implies a prior that’s uniform from -\infty to +\infty, which may not be ideal?

Presumably the model you’ve posted is a minimal toy and your concern about performance comes from knowing you’ll be doing a much more complicated model?

Well, in principle you could say the ex ante prediction is like normal(0.5, 0.15) for the vote share or something like this. So yeah, there is a uniform prior assumption. For the full model I essentially have two random walks that I only compute in the model block until the last day of polling/there is data and do any future prediction in the generated quantities block. And so if I were to partially pool in the model block for the final outcome I couldn’t do that in the generated quantities block and would have to compute all that stuff which I don’t want. The gain with the current approach is a speed increase by a factor 2-3+ with a model that originally takes 35 minutes or so.

1 Like

Current model code here if you are curious (and for anyone else who has pent-up anger over confusing model formulations)

data {
  int N_state;
  int N_national;
  int S;
  int T;
  int T_plus;
  int t_pos[T - 1];
  int s[N_state];
  int t_state[N_state];
  int t_national[N_national];
  cov_matrix[S] cov_mat_mu_b;
  int y_state[N_state];
  int n_state[N_state];
  int y_national[N_national];
  int n_national[N_national];
  // prior data
  vector[S] prev_ele;
  int prev_ele_days;
  // weights
  row_vector[S] weights;
  // past election data
  int T_past;
  vector[T_past * S] y_past;
  matrix[T_past, S] x_past;
  int s_past[T_past * S];
  int t_past[T_past * S];
  int t_between_ele[T_past - 1];
  // Fundamentals prediction
  int t_to_election;
  vector[S] x_last;
}
transformed data {
  cholesky_factor_cov[S] cholesky_cov_mu_b;
  real sqrt_t_pos[T - 1];
  vector[S] prev_ele_logit;
  real sqrt_prev_ele_days;
  vector[T_past - 1] sqrt_past_t_bet_ele;
  matrix[T_past - 1, S] diff_x_past;
  real sqrt_t_to_election;
  cholesky_cov_mu_b = cholesky_decompose(cov_mat_mu_b);
  sqrt_t_pos = sqrt(t_pos);
  prev_ele_logit = logit(prev_ele);
  sqrt_prev_ele_days = sqrt(prev_ele_days);
  sqrt_past_t_bet_ele = to_vector(sqrt(t_between_ele));
  diff_x_past = x_past[2:T_past] - x_past[1:(T_past - 1)];
  sqrt_t_to_election = sqrt(t_to_election);
}
parameters {
  // Poll model
  matrix[S, T] raw_mu_b;
  // Fundamentals
  //matrix[2, S] raw_beta;
  matrix[S, 2] raw_beta;
  //cholesky_factor_corr[2] L_omega;
  vector<lower = 0>[2] tau;
  real<lower = 0> sigma_past;
  vector<lower = 0>[S] sigma_x; // Variation of economic factors
  // Pooled
  vector[S] pred_p;
  vector[S] pred_f;
}
transformed parameters {
  // Poll model
  matrix[S, T] mu_b;
  vector[N_state] theta_state;
  vector[N_national] theta_national;
  vector[T] mu_weighted;
  matrix[S, 2] beta;
  mu_b[, 1] = sqrt_prev_ele_days * cholesky_cov_mu_b * raw_mu_b[, 1] + prev_ele_logit;
  for (i in 2:T) mu_b[:, i] = sqrt_t_pos[i - 1] * cholesky_cov_mu_b * raw_mu_b[:, i] + mu_b[:, i - 1];
  // National weighted
  mu_weighted = logit(weights * inv_logit(mu_b))';
  // Assign
  for (i in 1:N_state){
    theta_state[i] = mu_b[s[i], t_state[i]];
  }
  theta_national = mu_weighted[t_national];
  // Fundamentals
  //beta = (diag_pre_multiply(tau, L_omega) * raw_beta)';
  for (i in 1:2) beta[,i] = raw_beta[,i] * tau[i];
}
model {
  // Priors
  // Poll model
  to_vector(raw_mu_b) ~ std_normal();
  // Fundamentals
  sigma_past ~ normal(0, 0.1);
  to_vector(raw_beta) ~ std_normal();
  tau ~ normal(0, 1);
  //L_omega ~ lkj_corr_cholesky(1);
  sigma_x ~ normal(0, 0.2);
  // Likelihood
  // Poll model
  y_state ~ binomial_logit(n_state, theta_state);
  y_national ~ binomial_logit(n_national, theta_national);
  // Fundamentals
  for (i in 1:S){
    diff_x_past[,i] ~ normal(0, sigma_x[i] * sqrt_past_t_bet_ele);
  }
  for (i in 1:(T_past * S)){
    y_past[i] ~ normal(beta[s_past[i], 1] +
                       beta[s_past[i], 2] * x_past[t_past[i], s_past[i]],
                        sigma_past);
  }
  // Pooled
  pred_f ~ normal(beta[,1] + beta[,2] .* x_last, sqrt(square(sigma_x * sqrt_t_to_election) + square(sigma_past)));
  pred_p ~ multi_normal_cholesky(mu_b[:,T], cholesky_cov_mu_b * sqrt(T_plus));
}
generated quantities {
  matrix[T + T_plus, S] predicted_score;
  vector[T + T_plus] predicted_score_national;
  int y_rep_national[N_national];
  int y_rep_state[N_state];
  for (i in 1:S){
    predicted_score[1:T,i] = inv_logit(to_vector(mu_b[i]));
  }
  for (i in 1:T_plus){
    predicted_score[T + i,] = inv_logit(multi_normal_cholesky_rng(mu_b[,T], cholesky_cov_mu_b * sqrt(i)))';
  }
  predicted_score_national = predicted_score * weights';
  y_rep_state = binomial_rng(n_state, inv_logit(theta_state));
  y_rep_national = binomial_rng(n_national, inv_logit(theta_national));
}