Random walk prior for a marketing mix model

Hi all,
I am working on an marketing mix model as described in this
paper. The spend in each media channel is transformed using the adstock transformation. Each media channel has a coefficient from which we can compute the marginal effect of increasing our media spend.

I’d like this channel coefficient to evolve smoothly over time, so that we can report on the impact of some marketing initiatives. I planned on implementing this with a random walk prior - snippet shown below (notice the lower bound on the random walk values). The full model and code at the bottom of this post for reference.

parameters {
// the random walk coefficients for media variables
matrix<lower = 0>[N, num_media] beta_medias_raw;
// variance of the media coefficients
real<lower = 0> tau;
}
transformed parameters {
// the coefficients for media variables
matrix<lower=0>[N, num_media] beta_medias;

beta_medias[1] = beta_medias_raw[1];
for (n in 2:N) {
  beta_medias[n] = beta_medias[n - 1] + beta_medias_raw[n]*tau;
}
}
model {
  for (nn in 1:N){
      beta_medias_raw[nn] ~ normal(0, 1);
  }
  tau ~ cauchy(0, 1);
}

This runs fine and the results look sensible.

                                          mean   se_mean   sd       2.5%   25%     50%     75%    97.5% n_eff  Rhat
beta_medias_raw[1,1]      0.00    0.00           0.00     0.00     0.00     0.00     0.00     0.00   5427 1.00
beta_medias_raw[1,2]      0.01    0.00           0.00     0.01     0.01     0.01     0.01     0.01   2453 1.00
beta_medias_raw[2,1]      0.14    0.00           0.14     0.00     0.04     0.10     0.20     0.53   5102 1.00
beta_medias_raw[2,2]      0.56    0.01           0.47     0.02     0.20     0.45     0.82     1.76   3485 1.00

However, if I remove the lower bound on the random walk coefficient the sampler gets stuck in the initial condition.

parameters {
// the random walk coefficients for media variables
matrix[N, num_media] beta_medias_raw;
}


I am struggling to understand why this change would cause such an issue with sampling the model. This may be due to my unfamiliarity with random walk priors, so a good reference here would be appreciated.

My hypothesis is that because the coefficients have typically increased over time (this analysis is for a young startup) the model is having a hard time identifying when the coefficients are allowed to move up or down, but the model will sample fine when the coefficients are allowed to move in one direction. Again, I’m not fully confident on my understanding of random walk priors so I’m not sure if this is the case or there is something I’m missing.

Thanks

The full model code is

functions {
// the adstock transformation with a vector of weights
real Adstock(row_vector t, row_vector weights) {
return dot_product(t, weights) / sum(weights);
}
}
data {
// the total number of observations
int<lower=1> N;
// the vector of sales
real<lower=0> Y[N];
// the maximum duration of lag effect, in weeks
int<lower=1> max_lag;
// the number of media channels
int<lower=1> num_media;
// a vector of 0 to max_lag - 1
row_vector[max_lag] lag_vec;
// 3D array of media variables
row_vector[max_lag] X_media[N, num_media];
// the number of other control variables
int<lower=1> num_ctrl;
// a matrix of control variables
row_vector[num_ctrl] X_ctrl[N];
}
parameters {
// residual variance
real<lower=0> noise_var;
// the intercept
real alpha;
// the random walk coefficients for media variables
matrix[N, num_media] beta_medias_raw;
// variance of the media coefficients
real<lower = 0> tau;
// coefficients for other control variables
vector[num_ctrl] gamma_ctrl;
// parameter for geometric decay
vector<lower = 0, upper = 1>[num_media] retain_rate;
// hyperparameters for geometric decay rate
}
transformed parameters {
// a vector of the mean response
real mu[N];
// the coefficients for media variables
matrix<lower=0>[N, num_media] beta_medias;
// the cumulative media effect after adstock
// the cumulative media effect after adstock
vector[num_media] cum_effect;
row_vector[max_lag] lag_weights1;
row_vector[max_lag] lag_weights2;

beta_medias[1] = beta_medias_raw[1];
for (n in 2:N) {
  beta_medias[n] = beta_medias[n - 1] + beta_medias_raw[n]*tau;
}

for (nn in 1:N) {
  for (media in 1 : num_media) {
    for (lag in 1 : max_lag) {
      if (media == 1)
        lag_weights1[lag] = pow(retain_rate[media], (lag - 1));
      else if (media == 2)
        lag_weights2[lag] = pow(retain_rate[media], (lag - 1));
}
if (media == 1)
  cum_effect[media] = Adstock(X_media[nn, media], lag_weights1);
else if (media == 2)
  cum_effect[media] = Adstock(X_media[nn, media], lag_weights2);
}
mu[nn] = alpha + dot_product(cum_effect, beta_medias[nn]) + dot_product(X_ctrl[nn], gamma_ctrl);;
}
}
model {
  for (nn in 1:N){
      beta_medias_raw[nn] ~ normal(0, 1);
  }
  tau ~ cauchy(0, 1);
  alpha ~ normal(0, 5);
  gamma_ctrl ~ normal(0, 1);
  retain_rate ~ beta(1, 1);
  noise_var ~ normal(0, 1);
  Y ~ normal(mu, noise_var);
}
generated quantities{
  matrix<lower=0>[N, num_media] marginal_wom;

  for (nn in 1:N)
    for (media in 1:num_media)
      if (media == 1)
        marginal_wom[nn, media] = 1000 * beta_medias[nn, media] * sum(lag_weights1);
      else if (media == 2)
        marginal_wom[nn, media] = 1000 * beta_medias[nn, media] * sum(lag_weights2);

}

beta_medias_raw[1,1] in the bound version seems to have settled to zero; what happens in the unbound model if you do:

parameters {
// the random walk coefficients for media variables
matrix[N-1, num_media] beta_medias_raw;
// variance of the media coefficients
real<lower = 0> tau;
}
transformed parameters {
// the coefficients for media variables
matrix<lower=0>[N, num_media] beta_medias;

beta_medias[1] = rep_row_vector(0,num_media);
for (n in 2:N) {
  beta_medias[n] = beta_medias[n - 1] + beta_medias_raw[n-1]*tau; //n.b. n-1 index for raw bc it only has N-1 rows (implicit 0s in first row). 
}

But by definition a random-walk is not smooth. If you want smoothness, I suggest a GP (or, if you have too many timepoints, a GAM).

Also, this:

  for (nn in 1:N){
      beta_medias_raw[nn] ~ normal(0, 1);
  }

can be made faster as:

  to_vector(beta_medias_raw) ~ std_normal();

Thanks @mike-lawrence - appreciate the response.

I’ve tried out the code you suggested. The model still doesn’t sample well (low ESS, high Rhats). Looks like each chain is still getting stuck.


I’ve also tried out fixing each parameter of the model apart from the random walk coefficients to see if they were contributing to any issues with non-identifiability etc., but it looks the random walk prior is causing the problems here.

My understanding from another thread is that the parameters will cause a dependence on n, even when there is no trend present.

I’m not sure if that is correct, but I am going to try this model on a simulated dataset with no trend in the coefficients and see what happens.