Updating model with new data

Ryan_Chan · September 7, 2018, 11:47am

Hi everyone,

I was just wondering if it’s possible to update a model when new data arrives. So suppose I have some model X and build a model to obtain posterior estimates for some parameters, and I get some new data X’. Is it possible to rather than run the whole model again on all the data available X+X’, is it possible to just update the model for the new data X’ to get updated posterior estimates for the parameters?

Just thinking that this could save time if I didnt need to re-run the model again?

Might just be a stupid question but curious.

Thanks!
Ryan

bgoodri · September 7, 2018, 12:40pm

Yes

If you can obtain a parametric representation for the posterior conditional on X

Ryan_Chan · September 7, 2018, 12:53pm

Thanks for the reply Ben!

So if I had a stanfit object called stanfit_1 from fitting the model to X, how do I obtain the updated model stanfit_2 with data X+X’ from this?

bgoodri · September 7, 2018, 1:26pm

You look at the parameters conditional on X, see what distribution they are closest to, and use that as your prior when conditioning on X’ to obtain the new posterior distribution.

Ryan_Chan · September 7, 2018, 2:58pm

Can you explain how I can determine the closest distribution that the posterior distribution of the parameters is in Stan and RStan please?

bgoodri · September 7, 2018, 3:21pm

If it is anything other than multivariate normal, it is pretty difficult. But if it is multivariate normal, then you just need to estimate the mean vector and the covariance matrix (presumably with some regularization).

wds15 · September 7, 2018, 3:24pm

The task usually is to find a good transformation which turns the parameters to become multivariate normal (ideally uncorrelated). This is anyway a good exercise to do for a given model (if possible). So as a bonus you get a better sampling model, usually.

Ryan_Chan · September 7, 2018, 4:16pm

This makes sense but just not 100% clear how to implement this in practice.

I’ve implemented this football/soccer model by Baio and Blangiardo (http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.182.8659&rep=rep1&type=pdf) and my current stan code is:

data {
int nteams;
int ngames;
int home_team[ngames];
int away_team[ngames];
int<lower=0> home_goals[ngames];
int<lower=0> away_goals[ngames];
}
parameters {
real home;
real mu_att;
real mu_def;
real tau_att;
real tau_def;

vector[nteams-1] att_free;
vector[nteams-1] def_free;
}
transformed parameters {
vector[nteams] att;
vector[nteams] def;
vector[ngames] log_theta_home;
vector[ngames] log_theta_away;

// need to make sum(att)=sum(def)=0
for (k in 1:(nteams-1)) {
att[k] = att_free[k];
def[k] = def_free[k];
}
att[nteams] = -sum(att_free);
def[nteams] = -sum(def_free);

log_theta_home = home + att[home_team] + def[away_team];
log_theta_away = att[away_team] + def[home_team];
}
model {
home ~ normal(0, 10000);
mu_att ~ normal(0, 10000);
mu_def ~ normal(0, 10000);
tau_att ~ gamma(0.1, 0.1);
tau_def ~ gamma(0.1, 0.1);

att_free ~ normal(mu_att, 1/tau_att);
def_free ~ normal(mu_def, 1/tau_def);

home_goals ~ poisson_log(log_theta_home);
away_goals ~ poisson_log(log_theta_away);
}

The reason why I want to update the model is when testing the performance of it I’ve just kept increasing the data and running this whole model on the dataset again as more games come in.

The parameters I’m specifically interested in are att, def and home.

If the posterior distribution of the parameters after seeing a first set of games X is approximately multivariate normal, how do I make adjustments to this stan code so that I can obtain new posterior distribution by running it on the next set of games X’? So far, I’ve just been re-running the code for X+X’ when new games come in.

bgoodri · September 7, 2018, 4:44pm

Pass in the posterior mean vector and the posterior precision matrix as data. Eliminate all the old priors and use the multivariate normal instead.

Except your original posterior is going to be messed up because you did not constrain tau_att and tau_def to be positive. It would probably be better to declare them in log form in the parameters block and then antilog them in the transformed parameters block. And use real priors.

data {
  int nteams;
  int ngames;
  int home_team[ngames];
  int away_team[ngames];
  int<lower=0> home_goals[ngames];
  int<lower=0> away_goals[ngames];
  
  vector[5 * nteams - 2] mu;
  cov_matrix[rows(mu), rows(mu)] precision;
}
parameters {
  real home;
  real mu_att;
  real mu_def;
  real tau_att;
  real tau_def;

  vector[nteams-1] att_free;
  vector[nteams-1] def_free;
}
transformed parameters {
  vector[nteams] att;
  vector[nteams] def;
  vector[ngames] log_theta_home;
  vector[ngames] log_theta_away;

  // need to make sum(att)=sum(def)=0
  for (k in 1:(nteams-1)) {
    att[k] = att_free[k];
    def[k] = def_free[k];
  }
  att[nteams] = -sum(att_free);
  def[nteams] = -sum(def_free);

  log_theta_home = home + att[home_team] + def[away_team];
  log_theta_away = att[away_team] + def[home_team];
}
model {
  vector[5 + 2 * nteams - 2] theta = append_row([home, mu_att, mu_def, tau_att, tau_def]'
                                     append_row(att_free, def_free));
  theta ~ multi_normal_prec(mu, precision);
  home_goals ~ poisson_log(log_theta_home);
  away_goals ~ poisson_log(log_theta_away);
}

Ryan_Chan · September 7, 2018, 4:51pm

Thanks Ben, appreciate your patience, this is great help!

Ryan_Chan · September 7, 2018, 5:54pm

Shouldn’t the dimensions of these two vectors be the same? Is it meant to be vector[5 + 2 * nteams - 2] for both mu and theta?

bgoodri · September 7, 2018, 6:01pm

Yes

Topic		Replies	Views
Update model when new observations are available Modeling	3	917	December 1, 2021
Updating model based on new data via PSIS Modeling	10	363	December 15, 2024
Fit a beta distribution that captures correlation in the fitted parameters? Modeling rstan , techniques , fitting-issues , specification	6	1840	January 13, 2022
Estimating a posterior for a parameter which is only observed through (a part of a) linear transformation Modeling	9	705	November 5, 2018
Constraining Change in Estimates Over Time Modeling specification	12	591	December 1, 2018

Updating model with new data

Related topics