Origin-destination matrices

Hello,

this is the first time that I write a post on the Stan Forum.
My issue concerns the modeling of origin-destination (OD) matrices for human international migration flows.
In particular, my data are currently in the form of 3D-arrays composed of many OD matrices, one per each year under study.
The peculiarity of such OD matrices is that they are (usually) symmetrical matrices with countries on both rows and columns, with values on the off-diagonal elements, corresponding to the flows between each pair of countries, and zeros on the main diagonal (as within-country mobility is not considered).
Often there are also missing values in these OD matrices, corresponding to missing information on the flow between two countries.
These OD matrices contain counts of migration events and are therefore modeled using a Poisson distribution.

Now that I have explained the type of data, I come to my question.
I need to declare among the parameters a 3D-array (or array of matrices) including the expected value of my migration counts under the Poisson distribution. These 3D-arrays are a mix of known and unknown parameters, as the values on the main diagonal of each year-specific OD matrix are equal to zero, while the off-diagonal elements are unknown and need to be estimated.
I’ve been struggling to declare such partially known parameters, even after reading Section 3.2. of Stan User’s Guide.

In the example below, the array “r_imm” contains the data, which are distributed as a Poisson; “mu_r_imm” contains the expected values from the Poisson and is declared in the transformed parameter block; “mu_r_imm_zero”, declared in the transformed data block, contains the zeros on the main diagonal of each OD matrix, and “mu_r_imm_val” contains the off-diagonal parameters that I want to estimate.

Any suggestions on how to correctly assign each value to its appropriate position?

Thank you very much,

Emanuele

data {
  int<lower=0> N_year;                            // No. years
  int<lower=0> N_country;                         // No. countries
  int<lower=0> N_diff_country;                    // No. of inflows in 3D-array 
  int<lower=0> N_same_country;                    // No. of zeros in 3D-array
  int<lower=0> r_imm[N_country,N_country,N_year]; // Inflows between pairs of countries
  int zero_real;                                  // = 0.0    
}

transformed data {
  real<lower=0> mu_r_imm_zero[N_same_country] = zero_real; 
}

parameters {
  real<lower=0> mu_r_imm_val[N_diff_country];               
}

transformed parameters {
  real<lower=0> mu_r_imm[N_country,N_country,N_year]; 
}

model {
  for(t in 1:N_year){
    for(i in 1:N_country){
      for(j in 1:N_country){
          r_imm[i,j,t] ~ poisson(mu_r_imm[i,j,t]);
      }
    }
  }
}
1 Like

I am not totally sure of the specifics but the general idea would be to set up the 3D arrays in the transformed parameters block and declare the free elements of those 3D arrays as a vector in the parameters block. Something like

data {
  int<lower=0> N_year;                            // No. years
  int<lower=0> N_country;                         // No. countries
  int<lower=0> N_fixed;
  matrix[N_country,N_country] r_imm[N_year]; // negative values indicate missingness  
  ...
}
transformed data {
  int N_free = 0;
  for (t in 1:N_year) for(i in 1:N_country) for (j in 1:N_country)
    if (r_imm[t][i,j] < 0) N_free += 1;
}
parameters {
  vector<lower = 0>[N_free] free;
  ...
}
transformed parameters {
  matrix[N_country, N_country] mu_r_imm[N_year];
  {
    int pos = 1;
    for (t in 1:N_year) {
      for (i in 1:N_country) for (j in 1:N_country) {
        real temp = r_imm[t][i,j];
        if (temp < 0) { // missing
          mu_r_imm[t][i,j] = free[pos];
          pos += 1;
       } else mu_r_imm[t][i,j] = temp;
    }
  }
}
2 Likes

Hi Ben,

thank you very much for your prompt reply and suggestion.

If I understood correctly, you suggest to flag the missing values in the data (r_imm) and to set them as parameters to estimate in the parameters block, while leaving the observed value where they are. Hence, when I move to the model block, should I model mu_r_imm, not anymore r_imm, and have a new array (eta_r_imm) that contains the expected values according to the Poisson distribution?

model {
  for(t in 1:N_year){
    for(i in 1:N_country){
      for(j in 1:N_country){
          mu_r_imm[i,j,t] ~ poisson(eta_r_imm[i,j,t]);
      }
    }
  }
}

Moreover, if in the array eta_r_imm I need to constrain the cell on the main diagonals (within-country flows) to be equal to zero, can I just use the same trick that I used for the missing values (but this time referring to parameters being equal to zero and declared in the transformed data block)?

Emanuele

Yes, if the diagonal of a matrix is known to be zero, then just set it that way when i == j.

1 Like