Many issues about a Multivarite Probit purchase model: data simulation, reparameterization and speedup

fitting-issues
performance

#1

Hello,

I’m hoping for your advice about a Multivariate Probit i have to fit, and i know that surely is wrong for my poor experience.

I’m trying to predict purchase in 16 categories using as covariates some characteristics of the customers, card ownership of the store that supply the data and the purchases in the last month in the same categories for the same customers. Then, i try to adapt the approach of this paper (https://doi.org/10.1509/jmkr.42.2.233.62288).

Then, i will very pleased if you can give some advices about:

  • How i can simulate this kind of data? Obviously, i’m not sure if this model fit the data, so i want to recover known parameters first (i know is not a stan issue, but i need some guidance)
  • With the current model i have the maximum threedepth excedeed warning. I read that a non-centered parameterization could help, how i could apply it in this case ?
  • I’m not sure if i have to include the Jacobian.
  • Any speeding up, because the estimation is very low.

I would be very very grateful if someone can give me some advice on how to best approach this problem. Obviously, i know there are many issues in the post and i don’t want that you do my homework, but everything would be helpful.

The model looks as follows (and also i attach the data and stan code, i’m using Rstan):

Multivariate Probit v1.stan (2.8 KB)
Multivariate Probit v1.R (966 Bytes)
sample data.csv (562.5 KB)

functions {
int sum(int[,] a) {
int s;
s = 0;
for (i in 1:size(a))
for (j in 1:size(a[i]))
s = s + a[i,j];
return s;
}
}

data {
int<lower=1> NCAT; // NUMBER OF CATEGORIES
int<lower=0> NDAT; // NUMBER OF OBSERVATIONS
int<lower=1> NCHAR; // NUMBER OF CUSTOMER-LEVEL CHARACTERISTICS
int<lower=0,upper=1> y[NDAT,NCAT]; // PURCHASE MATRIX FOR PREDICTION
vector[NCAT] x[NDAT]; // PURCHASE MATRIX LAG 1
vector[NCHAR] x0[NDAT]; // DEMOGRAFIC COSTUMERS MATRIX
row_vector[NDAT] t; // BINARY FOR CARD OWNER
}

// MULTIVARIATE PROBIT TRANSFORMATION FROM USER GUIDE
transformed data {
int<lower=0> N_pos;
int<lower=1,upper=NDAT> n_pos[sum(y)];
int<lower=1,upper=NCAT> d_pos[size(n_pos)];
int<lower=0> N_neg;
int<lower=1,upper=NDAT> n_neg[(NDAT * NCAT) - size(n_pos)];
int<lower=1,upper=NCAT> d_neg[size(n_neg)];

N_pos=size(n_pos);
N_neg=size(n_neg);
{
int i;
int j;
i=1;
j=1;
for (n in 1:NDAT) {
for (d in 1:NCAT) {
if (y[n,d] == 1) {
n_pos[i] = n;
d_pos[i] = d;
i=i + 1;
} else {
n_neg[j] = n;
d_neg[j] = d;
j = j + 1;
}
}
}
}
}

parameters {
vector[NCAT-1] beta_free; // FREE PARAMETERS TO FIXED ONE
vector[NCAT] teta; // PARAMETER FOR CARD OWNERSHIP
vector[NDAT] gama; // PARAMETERS FOR INDIVIDUAL HIERARCHY
vector<lower=0>[NDAT] tau;
vector[NCHAR] delta;
cholesky_factor_corr[NCAT] L_Omega;
vector<lower=0>[N_pos] z_pos;
vector<upper=0>[N_neg] z_neg;

}

transformed parameters {
vector[NCAT] beta; // TRANSFORMATION OF BETA_FREE
vector[NDAT] dm; // MATURITY
vector[NCAT] utilidad[NDAT];
vector[NCAT] z[NDAT];

beta[1]=1; // FIX BETA 1

// FILL BETA VECTOR
for (i in 1:(NCAT-1))
beta[i+1] = beta_free[i];

// MATURITY
for (n in 1:NDAT)
dm[n] = dot_product(beta,x[n]);

// CONSTRUCT LATENT UTILITY OF EACH OBSERVATION
for (d in 1:NCAT)
for (n in 1:NDAT)
utilidad[n, d] = gama[n]*fabs(beta[d]-dm[n])+teta[d]*t[n];
;

for (n in 1:N_pos)
z[n_pos[n], d_pos[n]] = z_pos[n];
for (n in 1:N_neg)
z[n_neg[n], d_neg[n]] = z_neg[n];

}
model {
L_Omega ~ lkj_corr_cholesky(4);
to_vector(beta_free) ~ normal(0, 5);
to_vector(teta) ~ normal(0, 5);
to_vector(gama) ~ normal(0, 5);
to_vector(delta) ~ normal(0, 5);
to_vector(tau) ~ cauchy(0,2.5);
{

// FILL CUSTOMER-LEVEL PARAMETERS
for(n in 1:NDAT)
gama[n] ~ normal(dot_product(delta,x0[n]),tau);

// NON-CENTERED PARAMETERIZATION?
z ~ multi_normal_cholesky( utilidad , L_Omega); 

}
}
generated quantities {
corr_matrix[NCAT] Omega;
Omega = multiply_lower_tri_self_transpose(L_Omega);
}


#2

These models have proved to be really tricky to fit robustly, even with simulated data that’s well specified.

You want to follow the generative model and generate the parameters, then generate the data based on the parameters.

I don’t know, I’m afraid.

If you non-linearly transform parameters and use them on the left of the ~ (or in the equivalent target += usage) then you need a Jacobian.

That’s going to be solved when the treedepth problem is solved.


#3

Very thanks for your responde Bob, i really apologizes for the late response because i’m hard working in this problem.

After this post, i simplified the model a lot, for example i forgot the correlation of the dependent variables and go for a logit model instead the multivariate probit.

Then, i realize that i have a problem of non-identifiability in the basic model, because i could multiply the vector of parameters beta for -1 and the results are the same (for the absolute value). Teorically, this problem is removed if i constrain, for example, the first parameter of the vector to be positive and i do this in the form that following above, but the results looks like the rest of the parameters converge to the same values that if i dont constrain the first parameter, so i need some help to apply this.

The code here (and attached a simulation):MLOGIT SIMULATION.R (1.8 KB)

>data {
>  int<lower=1> NRUBROS;
>  int<lower=0> NCASOS;
>  int<lower=0,upper=1> y[NCASOS,NRUBROS];
>  vector[NRUBROS] x[NCASOS];
>}
>
>parameters {
>  real<lower=0> beta_free_aux;
>  vector[NRUBROS-1] beta_free;
>}
>
>transformed parameters {
>  vector[NRUBROS] beta;
>  
>  beta[1]=beta_free_aux;
>  
>  for (j in 2:(NRUBROS)){
>    beta[j] = beta_free[j-1];
> }
>}
>
> model {
>  vector[NCASOS] dm;
>  vector[NRUBROS] utilidad[NCASOS];
>  vector[NRUBROS] probabilidad[NCASOS];
>  
>  beta_free_aux ~ normal(1,1);
>  to_vector(beta_free) ~ normal(0, 5);
>  
>  
>  for (i in 1:NCASOS){
>      dm[i] = dot_product(beta,x[i]);
>  }
>  
>  
>  for (j in 1:NRUBROS){
>    for (i in 1:NCASOS){ 
>      utilidad[i, j] = fabs(beta[j]-dm[i]);
>      probabilidad[i, j] = exp(utilidad[i, j])/(1+exp(utilidad[i, j]));
>    }
>  }
>  
> { 
>    for(i in 1: NCASOS){
>     for(j in 1:NRUBROS){
>      y[i,j] ~ bernoulli(probabilidad[i,j]);
>     }
>  }
> }
>}
>

[edit: escaped code, but it already had > in it]


#4

But is it a problem in practice?


#5

This is just inv_logit(utilidad[i, j]).

You want to just do matrix multiplication here. Probably x * beta.

This is bad. It introduces a non-identifiability. You want a monotonic transform for that like exp. Why are you trying to keep it positive anyway? Inverse logit maps (-inf, inf) to (0, 1).

This should be vectorized at least one level if not by collapsing the matrices and arrays to vectors, which is even more efficient.


#6

Wow, i’m really grateful for your help. I will make all the changes you propose.

This is the main problem, i took that model of the utility from the paper i mencioned in the first post and one of my goals is compare this with others models (also, is a interesting idea the exp transform, thanks). The non-identifiability problem is solved constraining one of the element of beta to be positive (or negative, it doesn´t matter), but i don´t know how to apply that in stan.

For example, for a three categories simulation with a real vector beta (-1,1,2). In the traceplot i can see that the parameters converge to the two plausible solutions: (-1,1,2) and (1,-1,-2), but when i add the auxiliar parameter and its restriction:

real<lower=0> beta_free_aux;

i only recover the solution with a positive first parameter (1,-1,-2) and the other plausible solution (-1,1,2) its tranformed to (0,1,2) (because by the restriction the first parameter can´t be negative). But, what i really want to do is that the first parameter can only move in postivives values so that recover only the solutions with a positive first parameter, it is possible?


#7

You just need to declare the variable representing that value with lower = 0. Then you may have to put things back together into a bigger parameter vector, which you can do in the transformed parameters block.

parameters {
  real<lower = 0> x1;
  vector[N - 1] x2_N;
  ...
}
transformed parameters {
  vector[N] x = append_col(x1, x2_N);
  ...

#8

It’s not the same to do this ?

because with this declaration i have the example problem i told you before, i think the rest of the parameters are be able to move in the space solution with the variable doesn’t constrained.


#9

Yes, same behavior, just simpler to append and declare-define at the same time.


#10

I’m really grateful for your response Bob, but i can´t solve it yet. I think now i have a more specific problem and the topic don´t reflects that, so i open a new topic: Help for setting parameters to solve non-identifiability problem.

Thank you!