How many previous months to use to predict current month?

Hi all,

I’m trying to write a binomial model where the number of successes in the current month is realistically related to the number of trials not just of the current month but also some of the previous months, because there’s a lag in how long a trial takes (each one could be between a month and 6 months, plausibly - think of a realtor showing houses as number of trials and the number of houses sold & closed in a month as the number of successes). (Also: would anyone recommend another distribution here?)

I’d like to try to infer a parameter that says how many trials from the previous month (or 2 months, etc) to use for the current month. I tried modeling it as a fixed-length simplex but I get tons of tree depth errors. Anyone have ideas for this? Here’s a sketch of the current model:

data {
  int N;
  int<lower=1, upper=12> months[N];
  int<lower=2000, upper=2019> years[N];
  vector<lower=0>[N] showings;
  int<lower=0> closings[N];
  int lag_months; // numer of previous months to use
  simplex[lag_months] showing_effect;
}
transformed data {
  int Nc = N - lag_months;
}
parameters {
  real month_mean;
  real<lower=0> month_stdev;
  vector<lower=0, upper=1>[12] month_effects;
  //simplex[lag_months] showing_effect;
}
transformed parameters {
  vector<lower=0>[Nc] num_showings = rep_vector(0, Nc);
  for (n in 1:Nc) {
      for (m in 0:(lag_months-1)) {
          num_showings[n] += showings[n-m+lag_months] * showing_effect[m+1];
      }
  }
}
model {
  // hierarchical priors
  month_mean ~ normal(0, 2);
  month_stdev ~ normal(0, 2);
  month_effects ~ normal(month_mean, month_stdev);
  
  for (n in 1:Nc) {
      int nf = 0;
      while (nf < floor(num_showings[n])) nf += 1;
      closings[n+lag_months] ~ binomial(nf, month_effects[months[n]]);
  }
}

where I’ve commented out the simplex showing_effect parameter and am just reading it in as data for now, which works fine but the model seems sensitive to it and I’d like to try to estimate it if possible.

If the motion in the volume of sales is not dramatic you may not have much information to sort out the length of the lag (so the sampler is free to roam the simplex and that leads to tree depth problems). You could probably get a lot of information from looking at data on offers and closings—any sane real estate company would have records of that so it should (?) be available. In the past I have used survival distributions to model the time to closing piece in similar models and that might be another way to include enough structure to make this model work.

1 Like

Urk, I meant to say the timing of individual offers and closings