Trying to reproduce a nonrandom marketing mix model


I read a marketing paper on nonrandom marketing mix (attached). The authors estimated the model using MCMC successfully. The authors actually provide the data within the bayesm package (accessible thru here). But the MCMC code is not provided.

My reproduction Stan code works for the conditional model (attached). But Stan does not work for the full model(attached). Can someone offer suggestion, or is it the kind of model that Stan cannot handle?

The conditional model is

y[i,t] ~ NBD(b0[i] + b1[i] * Det[i,t] + b2[i] * ln(y[i,t-1]+1), phi)

from eqn (1) and (3) of the paper. This is a typically hierarchical NBD with random slopes.

The full model is the conditional model plus

Det[i,t] ~ Poisson (a0 + a1 * b0[i]/(1-b2[i]) + a2 * b1[i]/(1-b2[i]))

from eqn (7) and (8) of the paper. This is innovative part of the paper. The full model is a joint model on the outcome and a predictor, and the authors argue the predictor is nonrandom. The authors assumes a interpretable relationship between the predictor value and the individual slopes from the conditional model to account for the non-randomness. They argue that if a1 and a2 are 0, there would be randomness.

The output for the full model using my code after 2000 iterations is:

Warning messages:
1: There were 1733 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help.
2: There were 749 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 10.
3: There were 1 chains where the estimated Bayesian Fraction of Missing Information was low.
4: Examine the pairs() plot to diagnose sampling problems

All the traceplots are terrible and there is no mixing at all. Any suggestion would be fully appreciated!

Response_modeling_with_nonrandom_marketi.pdf (333.9 KB)long_nbd_mvn.stan (1.7 KB) long_nbd_mvn_poi.stan (2.0 KB) run_code.r (2.2 KB)

1 Like

sorry for note getting to you earlier. Did you manage to move forward in the meantime?

A set of hints to debug models is at Divergent transitions - a primer. Particularly, I would try, if you can fit the Poisson part as a standalone model.

One thing that looks suspicious to me is the b0[i]/(1-b2[i]) term as this is not continuous in b2[i] - i.e. the value flips from “very large positive” to “very large negative” as b2 crosses 1. This is likely to make Stan (and any other sampler really) unable to traverse this region in the parameter space, so provided that there is at least one i for which there is non-negligible posterior mass of b2[i] close to 1 from either side, this IMHO should cause problems. Alternatively, I can see how that would lead to bimodal posterior for b2[i] (and other coefficients) as values close to 1 could lead to very low posterior density while some values a bit further on either side could be compatible with the data. Once again, bimodal posteriors will frustrate almost any sampler.

I do not understand the math of the model enough to judge to what extent can we say a priori that some of the values are non-sensical and exclude them. Alternatively, one could try to reparametrize the model so that linear predictors for both outcomes are smooth with respect to all the parameters, but I don’t immediately see such a parametrization. Maybe treating the cases b2[i] > 1 and b2[i] < 1 separately, making the model a mixture could also help.

It is unfortunately not uncommon that when people see problems when reimplementing a model in Stan that it turns out that the original sampler also didn’t work for the problem, but it failed silently while Stan fails loudly.

Best of luck with the model.

I am still unable to resolve the issues, but thank you very much for the suggestions.

1 Like

This may not fix everything, however, in eqn 8 they show that \ln(\nu) = \gamma_0 + \gamma_1 \frac{\beta_{0i}}{1 - \beta_{2i}} + \gamma_2 \frac{\beta_{1i}}{1 - \beta_{2i}} so you need

target += poisson_log_lpmf(d | exp(nu[id]));

Thanks for that suggestion. I was under the impression that poisson_log_lpmf is used for log-linear Poission regression and the predictor does not need to be exponentiated. Let me double-check. Thanks.

Oh haha yes! I forgot that this is in. I’m so used to using poisson.

I am not sure if you still need/want some additional help from us - is there something in the suggestions that is hard to follow? Or did you manage to move forward a bit but still having other issues?

Hi, no, I am unable to resolve the issue because I think my stan code is correct. So I will have to let this go.

I played around a bit with the code and just leaving out 1-b2[i] from the marginal model removes the divergences on a random 100 doctor subset.

I added the doctor level regression with Z. Though it’s still not clear to me exactly what they did in the paper. I think this similar but I might be misreading and what you’ve done is more in line. Anyway, here’s my code. I haven’t run it on the full data since it take quite a long time.

  int<lower=1> P; // total number of periods
  int<lower=1> N; // total data points
  int<lower=1> J; // total number of doctor
  int<lower=1> L; // number of demographic variables (with intercept)
  int<lower=1> K; //number of individual-level coefficients (with intercept)
  matrix[L, J] Z;  // doctor regression covariates
  int<lower=0> y[N]; // outcome vector (TRx)  
  int<lower=1> t[N];    // period vector
  int<lower=0> d[N]; // details vector
  int<lower=1> id[N];   // doctor id vector
  real<lower=0> l[N]; //log(lagged TRx+1)

transformed data {
  matrix[N, K] X = append_col(rep_vector(1, N),
                              append_col(to_vector(d), to_vector(l))

  real<lower=0> phi;  //dispersion parameter for NBD
  vector<lower=0>[K] tau;
  cholesky_factor_corr[K] L_Omega; // cholesky corr. for individual-level coefficients
  matrix[K, L] gamma; 
  matrix[K, J] z;
  vector[3] gam;
transformed parameters {
  matrix[K, J] beta = gamma * Z + diag_post_multiply(L_Omega, tau) * z;
  tau ~ exponential(1);
  to_vector(z) ~ std_normal();
  L_Omega ~ lkj_corr_cholesky(2);
  phi ~ gamma(2, 1); 
  to_vector(gamma) ~ normal(0, 5);
  gam ~ student_t(5, 0, 1);
// NBD likelihood
    vector[N] theta = rows_dot_product(X, beta[, id]');
    vector[J] nu;

    for (j in 1:J)
        nu[j] = gam[1] + ( gam[2] * beta[1, j] + gam[3] * beta[2, j] );
        //nu[j] = gam[1] + ( gam[2] * beta[1, j] + gam[3] * beta[2, j] ) / (1 - beta[3, j]);
    y ~ neg_binomial_2_log(theta, phi);
    d ~ poisson_log(nu[id]);


generated quantities {
  matrix[K, K] Omega = multiply_lower_tri_self_transpose(L_Omega);
  matrix[K, K] Sigma = multiply_lower_tri_self_transpose(diag_pre_multiply(tau, L_Omega));
1 Like

Thank you very much.

I ran your code without any modification but it still gives me large r-hat’s, especially for the gam parameter.

Nevertheless, I think your way of coding somehow improves the way I originally did. I will spend a bit more time to understand how I can learn from you. Thanks again.

I did try another way which seemed to sample the best the changes are just treating it like a linear model and multiplying beta by a new vector of coefficients for nu.

Changes are:

  row_vector[K] gam;
  vector[J] alpha;
  alpha ~ std_normal();
// NBD likelihood
    for (j in 1:J)
        nu[j] = alpha[j] + gam * beta[, j];
1 Like

Thank you for your continuing interest in this problem.

In the paper’s formulation, nu is the individual-level Poisson parameter for the detailing (with or without beta3):

nu[j] = gam[1] + ( gam[2] * beta[1, j] + gam[3] * beta[2, j] );
//nu[j] = gam[1] + ( gam[2] * beta[1, j] + gam[3] * beta[2, j] ) / (1 - beta[3, j]);

But the “individualness” comes from the beta’s only. Could you tell me what alpha in yours is supposed to represent? Thanks.

The doctor specific effect on sales calls (ie “details”). The idea being that some doctors may be detailed more than other doctors for whatever reason(s) not otherwise captured in beta and gam.

Thank you for the explanation. This seems to the Poisson-lognormal distribution setup for the details. I will give it a try!