Nonidentifiable parameters in latent factor model for dyadic/network data

Dear Stan Community,

I am currently trying to fit a regression model introduced by Peter Hoff (discussed e.g. here or here) which includes variance components and latent multiplicative effects to deal with the second- and third-order dependence typical for social network data.

The model I am going for here is:

logit(y_{i,j} = 1) = \alpha + \textbf{x}_{i,j}^T \pmb{\beta } + a_i + b_j + \epsilon_{i,j} +\textbf{u}_i^T \Lambda \textbf{v}_j

where a_i and b_j are sender- and receiver-specific random effects and \epsilon_{ij} are dyadwise correlated standard normal random effects such that:

\{(a_1, b_1), ..., (a_n, b_n)\} \sim N(0, \Sigma_{ab})

\{(\epsilon_{ij}, \epsilon_{ji}: i \neq j)\} \sim N(0, \Sigma_{\epsilon})

and
\Sigma_{ab} = \left( \begin{array}{rrr} \sigma_a^2 & \sigma_{ab} \\ \sigma_{ab} & \sigma_b^2 \\ \end{array}\right), \Sigma_\epsilon = \left( \begin{array}{rrr} 1 & \rho \\ \rho & 1 \\ \end{array}\right).

So far so good, this part of the model (which is also known as the Social Relations Model) samples fine when fit alone (although it could surely be improved).

The problems are with the latent part where \textbf{u}_i and \textbf{v}_j are R-dimensional vectors representing unobserved sender and receiver characteristics and \pmb{\Lambda} is a R \times R diagonal matrix with weights (the whole term is supposed to capture structural features of a network such as triadic closure or stochastic equivalence).

This does not really seem identifiable to me as varying permutations of the signs of the parameters would lead to the same overall contribution, which seems also to be picked up by the chains:

The model is implemented as a probit variant and with a custom Gibbs sampler in the ‘amen’ R package but I am not really sure what they did to identify the latent terms.

Did I misunderstand something here or does anyone have a clue how to circumvent this issue? Any help would be greatly appreciated!

All the best,
Jakob

PS: Here is my stan model:

 
data {
  
  int<lower=0> N_nodes;                             // number of nodes
  int<lower=0> N_dyads;                             // number of pairs of nodes N*(N-1)
  int<lower=0> K;                                   // number of predictors
  int<lower=0> R;                                   // dimension of latent factors
  int<lower=1,upper=N_dyads> sender[N_dyads];       // sender ID of each tie
  int<lower=1,upper=N_dyads> receiver[N_dyads];     // receiver ID of each tie
  int<lower=0,upper=1> y[N_dyads];                  // binary tie outcome
  vector[K] x[N_dyads];                             // predictor matrix
  int<lower=1,upper=N_dyads/2> idx_eps[N_dyads, 2]; // index for assigning dyadic errors

}

parameters {
  
  real mu;
  vector[K] beta;

  vector[2] ab[N_nodes];
  corr_matrix[2] omega_ab;
  vector<lower=0>[2] tau_ab; 

  real<lower=0,upper=1> rho_epsilon;
  vector[2] z_epsilon[N_dyads/2];

  vector[R] u[N_nodes];
  vector[R] v[N_nodes];
  vector[R] lambda;

  real<lower=0> tau_u;
  real<lower=0> tau_v;

}

transformed parameters {
  
  vector[2] epsilon[N_dyads/2]; 
  cov_matrix[2] sigma_epsilon;
  matrix[2,2] sigma_epsilon_chol;
  matrix[R,R] Lambda;

  sigma_epsilon[1,1] = 1;
  sigma_epsilon[2,2] = 1;
  sigma_epsilon[1,2] = rho_epsilon;
  sigma_epsilon[2,1] = sigma_epsilon[1,2];
  sigma_epsilon_chol = cholesky_decompose(sigma_epsilon);

  for (i in 1:N_dyads/2) 
    epsilon[i] = sigma_epsilon_chol * z_epsilon[i];

  Lambda = diag_matrix(lambda);
  
}

model {

  // priors for intercept & regression coefficients
  
  mu ~ normal(0, 5);
  beta ~ normal(0, 2);
  
  // priors for sender & receiver varying effects
  
  omega_ab ~ lkj_corr(3);
  tau_ab ~ exponential(.5);
  ab ~ multi_normal(rep_vector(0, 2), quad_form_diag(omega_ab, tau_ab));

  // priors for dyad correlation

  rho_epsilon ~ beta(2, 2);
  for (i in 1:N_dyads/2) z_epsilon[i] ~ std_normal();

  // priors for latent factors


  tau_u ~ exponential(.5);
  tau_v ~ exponential(.5);

  for (n in 1:N_nodes) u[n] ~ normal(0, tau_u);
  for (n in 1:N_nodes) v[n] ~ normal(0, tau_v);
  lambda ~ normal(0,2);

  // likelihood
  {
    vector[N_dyads] eta;

    for (d in 1:N_dyads)
      eta[d] = mu + dot_product(x[d], beta) + ab[sender[d], 1] + 
      ab[receiver[d], 2] + epsilon[idx_eps[d,1], idx_eps[d,2]] + u[sender[d]]'*Lambda*v[receiver[d]];

    y ~ bernoulli_logit(eta);   
  }

}

generated quantities {

  vector[N_dyads] yrep;

  for (d in 1:N_dyads)
    yrep[d] = bernoulli_logit_rng(mu + dot_product(x[d], beta) + ab[sender[d], 1] + 
      ab[receiver[d], 2] + epsilon[idx_eps[d,1], idx_eps[d,2]] +  + u[sender[d]]'*Lambda*v[receiver[d]]);

}

Hi,
sorry for not picking this up sooner.

Great that you tried the simple model first, this is very good practice!

I didn’t try to verify this claim, but if this is the case, it is indeed a huge issue.

Just to be clear, if I expand the matrix notation (which I am not great at), I get:

\textbf{u}_i^T \Lambda \textbf{v}_j = \sum_k u_{i,k} \lambda_k v_{j,k}

where \lambda_k are the diagonal elements of \Lambda - is that correct? If that is so, I don’t think you have other option than to fix some of the signs. I am currently out of time so can’t really work out how to preserve the full flexibility, but it feels it should not be difficult (narrator: it may still be difficult).

Note that it sometimes happens that published packages just ignore this kind of problems as Gibbs sampler doesn’t have strong diagnostics to show it.

Hope that helps! If not, I could try to think about it more thoroughly sometime later.

Hi Martin,

sorry for only coming back to this so late, i got cought up in the starting semester.

In this paper, the author discusses the issue for a related model and proposes to rotate the posterior sample values to a common orientation using a ‘procrustean’ transformation (no idea what that is). I also don’t know how that would work when running multiple chains, I think I read somewhere that in the ‘boral’ r package similar nonidentifiability issues due to latent variables are handled by only running one chain, but I might be mistaken.

Not sure if I am trying to do any of this anytime soon, but still thanks again for your help!

All the best
Jakob

Procrustean would refer to https://en.wikipedia.org/wiki/Procrustes_analysis (which is a cool topic). I can see how that would work as post processing after Gibbs sampling, but not sure how one would integrate that with Stan, as the non-identifiability could break individual chains. There however IMHO has to exist a way to leverage the general idea to fix the signs without losing any of the flexibility.

@Jakob I’d be curious to hear how this turned out. Did you keep moving forward with the code?