Incorporating spatial autocorrelation in Zero-Inflated Beta regression with a temporal structure

Hello all,

I am trying to model a continuous proportion (fire coverage) with a decent number of 0 observations (i.e., y ∈ [0,1) ), and (in addition to covariate effects on the values of non-zero observations) I am interested in the influence of covariates on the probability of obtaining a 0 value, hence I am using zero-inflated beta regression. My data has both a temporal and spatial structure: I have a landscape which I have divided into grids, and then extracted data on the proportional area burnt for each grid in each year over a ~30 year time series.

I don’t feel I need to account for temporal autocorrelation, as many of my predictor variables should, in theory, be the factors driving any apparent temporal autocorrelation. I am also including Year as a random effect in the mu, phi and zi models. The problem arises with spatial autocorrelation. I had been including an ESICAR term in the mu model (initially using brms):


forest_fire_perc_formula <-  
  bf(Forest_Percent_Burnt ~ (1|Year) + Predictors... + car(adj_mat, gr = Landscape_ID, type = 'esicar'),
     phi ~ (1|Year),
     zi ~ (1|Year) + Predictors...,
     decomp = 'QR')

The predictors are the same in the mu and zi components if it is of any relevance. adj_mat is a N x N binary matrix indicating adjacencies (1s), where N represents the number of independent sites, regardless of year (i.e., the number of grid cells). I am only including spatial autocorrelation in the main model, following this paper Zero-Inflated Beta Distribution Regression Modeling | SpringerLink. That paper uses a gaussian process however, is it reasonable to use a CAR for this purpose?

The problem is, this format results in an identical value of the spatial autocorrelation random effect for each landscape across years. For instance, for the first grid cell (Landscape_ID = 1) the 30 values of rcar (one per year) are identical. This does not align with what I intended. As fires are highly seasonal in my landscape, I’m assuming there is only really spatial autocorrelation within years, and I would thus expect the values of rcar to vary both spatially and temporally.

Thus, I had a go at amending the STAN code to index the ESICAR term by year, placing a hyperprior on the SD parameter:


functions {

   real zero_inflated_beta_lpdf(real y, real mu, real phi, real zi) {
     row_vector[2] shape = [mu * phi, (1 - mu) * phi];
     if (y == 0) {
       return bernoulli_lpmf(1 | zi);
     } else {
       return bernoulli_lpmf(0 | zi) +
              beta_lpdf(y | shape[1], shape[2]);
     }
   }

   real zero_inflated_beta_logit_lpdf(real y, real mu, real phi, real zi) {
     row_vector[2] shape = [mu * phi, (1 - mu) * phi];
     if (y == 0) {
       return bernoulli_logit_lpmf(1 | zi);
     } else {
       return bernoulli_logit_lpmf(0 | zi) +
              beta_lpdf(y | shape[1], shape[2]);
     }
   }

  real zero_inflated_beta_lccdf(real y, real mu, real phi, real zi) {
    row_vector[2] shape = [mu * phi, (1 - mu) * phi];
    return bernoulli_lpmf(0 | zi) + beta_lccdf(y | shape[1], shape[2]);
  }

  real zero_inflated_beta_lcdf(real y, real mu, real phi, real zi) {
    return log1m_exp(zero_inflated_beta_lccdf(y | mu, phi, zi));
  }
  
  real sparse_icar_lpdf(vector phi, real sdcar, int Nloc,
                        int Nedges, data vector Nneigh, data vector eigenW,
                        array[] int edges1, array[] int edges2) {
    real tau;  // precision parameter
    row_vector[Nloc] phit_D;  // phi' * D
    row_vector[Nloc] phit_W;  // phi' * W
    tau = inv_square(sdcar);
    phit_D = (phi .* Nneigh)';
    phit_W = rep_row_vector(0, Nloc);
    for (i in 1:Nedges) {
      phit_W[edges1[i]] = phit_W[edges1[i]] + phi[edges2[i]];
      phit_W[edges2[i]] = phit_W[edges2[i]] + phi[edges1[i]];
    }
    return 0.5 * ((Nloc - 1) * log(tau) -
           tau * (phit_D * phi - (phit_W * phi)));
  }

}


data {

  int<lower=1> N;  // total number of observations
  vector[N] Y;  // response variable
  int<lower=1> Nyrs; // total number of years
  array[N] int<lower=1> obsYr; // year index of each observation
  
  int<lower=1> K;  // number of population-level covariates for main model
  matrix[N, K] X;  // population-level design matrix for main model

  int<lower=1> K_zi;  // number of population-level covariates for zero-inflation component
  matrix[N, K_zi] X_zi;  // population-level design matrix for zero-inflation component

  // data for the CAR structure
  int<lower=1> Nloc; // number of sampled locations (time invariant)
  array[N] int<lower=1> Jloc; // location index of each observation
  int<lower=0> Nedges; // total number of edges among all observations (time invariant)
  array[Nedges] int<lower=1> edges1; // index of first cell in pairs that share edge (time invariant)
  array[Nedges] int<lower=1> edges2; // index of second cell in pairs that share edge (time invariant)
  vector[Nloc] Nneigh; // number of neighbours for each site (time invariant)
  vector[Nloc] eigenMcar; // eigen values for each site (time invariant)
  int prior_only;  // should the likelihood be ignored? used for prior predictive checks

}


transformed data {

  int Kc = K - 1; // number of linear terms minus intercept
  matrix[N, Kc] Xc;  // centered version of X without an intercept
  vector[Kc] means_X;  // column means of X before centering

  // matrices for QR decomposition
  matrix[N, Kc] XQ;
  matrix[Kc, Kc] XR;
  matrix[Kc, Kc] XR_inv;

  int Kc_zi = K_zi - 1; // number of linear terms minus intercept
  matrix[N, Kc_zi] Xc_zi;  // centered version of X_zi without an intercept
  vector[Kc_zi] means_X_zi;  // column means of X_zi before centering

  // scale and centre predictor variables
  for (i in 2:K) { // standard component
    means_X[i - 1] = mean(X[, i]);
    Xc[, i - 1] = X[, i] - means_X[i - 1];
  }

  for (i in 2:K_zi) { // zero-inflated component
    means_X_zi[i - 1] = mean(X_zi[, i]);
    Xc_zi[, i - 1] = X_zi[, i] - means_X_zi[i - 1];
  }

  // compute and scale QR decomposition
  XQ = qr_thin_Q(Xc) * sqrt(N - 1);
  XR = qr_thin_R(Xc) / sqrt(N - 1);
  XR_inv = inverse(XR);
  

}


parameters {

  vector[Kc] bQ;  // regression coefficients at QR scale
  real Intercept;  // temporary intercept for centered predictors
  real sdcar_mean; // mean of hyperparameter for SD of year-level CAR structures
  real<lower=0> sdcar_sd; // sd of hyperparameter for SD of year-level CAR structures
  array[Nyrs] real<lower=0> sdcar;  // SD of the year level CAR structures
  array[Nyrs] vector[Nloc - 1] zcar;
  real Intercept_phi;  // temporary intercept for centered predictors
  vector[Kc_zi] b_zi;  // population-level effects
  real Intercept_zi;  // temporary intercept for centered predictors
  real<lower=0> sd_1;  // group-level standard deviation for main model
  vector[Nyrs] z_1;  // standardized group-level effects for main model (year random effect)
  real<lower=0> sd_2;  // group-level standard deviation for phi model
  vector[Nyrs] z_2;  // standardized group-level effects for phi model (year random effect)
  real<lower=0> sd_3;  // group-level standard deviation for zi model
  vector[Nyrs] z_3;  // standardized group-level effects for zi model (year random effect)
  
}


transformed parameters {

  array[Nyrs] vector[Nloc] rcar; // CAR random effect values for each observation
  vector[Nyrs] r_1_1;  // actual group-level effects
  vector[Nyrs] r_2_phi_1;  // actual group-level effects
  vector[Nyrs] r_3_zi_1;  // actual group-level effects

  // sum-to-zero constraint
  for (y in 1:Nyrs) {
    rcar[y][1:(Nloc - 1)] = zcar[y];
    rcar[y][Nloc] = - sum(zcar[y]);
  }

  // unstandardize group-level effects
  r_1_1 = (sd_1 * (z_1));
  r_2_phi_1 = (sd_2 * (z_2));
  r_3_zi_1 = (sd_3 * (z_3));

}


model {

  real lprior = 0;  // prior contributions to the log posterior

  // likelihood not including constants
    if (!prior_only) {

        // initialize linear predictor term
        vector[N] mu = rep_vector(0.0, N);
        vector[N] phi = rep_vector(0.0, N);
        vector[N] zi = rep_vector(0.0, N);

        mu += Intercept + XQ * bQ;
        phi += Intercept_phi;
        zi += Intercept_zi + Xc_zi * b_zi;

        for (n in 1:N) {
            // add more terms to the linear predictor
            mu[n] += rcar[obsYr[n]][Jloc[n]] + r_1_1[obsYr[n]];
            phi[n] += r_2_phi_1[obsYr[n]];
            zi[n] += r_3_zi_1[obsYr[n]];
        }

        mu = inv_logit(mu);
        phi = exp(phi);
        for (n in 1:N) {
            target += zero_inflated_beta_logit_lpdf(Y[n] | mu[n], phi[n], zi[n]);
        }
    }

    // priors not including constants
    lprior += normal_lupdf(bQ | 0,1); // normal(0,1) prior on all sloped
    lprior += logistic_lupdf(Intercept | -2,0.5); // moderately informative prior on standard intercept; fire cover low in almost all instances
    lprior += normal_lupdf(sdcar_mean | 0,1); // hyperparamater components for temporally variable CAR structure
    lprior += cauchy_lupdf(sdcar_sd | 0,2);
    for(y in 1:Nyrs) {
       lprior += cauchy_lupdf(sdcar[y] | sdcar_mean,sdcar_sd);
    }
    lprior += normal_lupdf(Intercept_phi | 0,1);
    lprior += normal_lupdf(b_zi | 0,1);
    lprior += logistic_lupdf(Intercept_zi | 0, 1);
    lprior += cauchy_lupdf(sd_1 | 0,2);
    lprior += cauchy_lupdf(sd_2 | 0,2);
    lprior += cauchy_lupdf(sd_3 | 0,2);
    target += lprior;
    for(y in 1:Nyrs) {
        target += sparse_icar_lpdf(
        rcar[y][1:Nloc] | sdcar[y], Nloc, Nedges, Nneigh, eigenMcar, edges1, edges2
        );
    }
    target += std_normal_lupdf(z_1);
    target += std_normal_lupdf(z_2);
    target += std_normal_lupdf(z_3);

}


generated quantities {
  // obtain the actual coefficients
  vector[Kc] b = XR_inv * bQ;
  // actual population-level intercept
  real b_Intercept = Intercept - dot_product(means_X, b);
  // actual population-level intercept
  real b_phi_Intercept = Intercept_phi;
  // actual population-level intercept
  real b_zi_Intercept = Intercept_zi - dot_product(means_X_zi, b_zi);
}


This took a very long time to run (1500 warm up, 1500 sampling) and did not converge at all, making me think perhaps I am misunderstanding the premise of CAR models, and they should not vary according to another factor.

I have also tried:

Providing adj_mat as a K x K matrix, where K is the total number of samples (i.e., n_grid_cells * n_years) and samples can only ever be adjacent to samples from the same year, but this did not converge either.

Just using a standard prior on the year-level sdcars (i.e., lprior += cauchy_lupdf(sdcar[y] | 0,2); ). But this also did not converge

Ultimately my questions are:

Is a ESICAR term appropriate in this context?
Should/can the ESICAR vary with year?
If the answer to these is yes, what is wrong in my code that is preventing convergence?

Thanks very much for any help,

Ciar

It’s up to you and your domain expertise to decide whether the spatial terms should vary by year, but it certainly seems reasonable.

If I’m understanding correctly, this is exactly what I would think about trying to do. That is, providing a block-diagonal adjacency matrix where blocks are years, resulting in “islands” in the adjacency graph with one island per year. Can you say a little bit more about what happened when you tried to fit it this way? Just how bad did the convergence diagnostics look?

If you’re willing to use a gp rather than a CAR structure, not that brms::gp accepts a by argument that I think should let you fit separate gps per year.

Hi Jacob,

Thanks for the response and sanity check. I misremembered - the model with an adjacency matrix K x K didn’t not converge, it never ran. I have >1000 sites in each year, so the K x K matrix is 13.6GB. Although my PC has 16gb RAM, this is still not enough, and the model throws the error ‘Error: cannot allocate vector of size 12.7 Gb’ during compilation. I do have access to a university cluster, and thus a lot more RAM, so I could give that a go, but I feel like it might still take forever to run.

I actually didn’t completely misremember though. I did run that version of the model on a subset of 20 sites (20 sites * 34 years = 680 observations), and this is the one that didn’t converge. I ran it for 1000 warm up and 1000 sample and all the parameters of interest had Rhat values of 2.5-3.5.

I’ll have a go with gp. My initial adversion to using a gp was brcause I assumed (perhaps baselessly) that it would take a lot longer to sample. This model is already very large and takes a long time, and I’m also intending to run a similar, even larger model next. However, a slow model that converges is better than a fast(er) model that doesn’t :)

What happens if you run the CAR specification on all sites but just in 1 year (dropping all predictors that only vary across years)? Does that also fail to converge?

And what happens if you run the multi-year model, but with no CAR term?

I have also run the multi-year model without CAR (or any spatial autocorrelation term for that matter). It converged well and the parameter estimates were logical. However, Moran’s I suggested spatial autocorrelation remained in the resudials. My spatial autocorrelation checks prior to modelling suggested autocorrelation wass substantial and highly significant.

Similarly, running the model for just data from a single year resulted in proper convergence and logival coefficient estimates. I’ve also ran the full model without a spatial term, and while that converged, it yielded substantially different coefficient estimates than the (incorrect) ESICAR model.

Taken together that implies to me that the full model with the CAR terms (represented as a block-diagonal adjacency matrix) would probably be able to converge, though it might (or might not) take a prohibitively long time to fit. Under the hood, I think that the giant unwieldy matrix gets re-encoded in sparse two-column format, so there is some hope that if you can get past the memory bottleneck of passing the matrix, that things might be tractable.

I will do some trials using a Gaussian Process instead and see if that improves anything. I’ll also delve a bit deeper into where the memory issue is occuring - as you say, it is presumably being converted to a sparse matrix, so there may be some way to do this manupilation prior to passing the data to Stan. Thanks for your help so far, will update on how this goes

After trialling the Gaussian process term on a subsample of my data it converged very well and accounted for spatial autocorrelation in the residuals. So, I’ve opted to go with that instead. Thanks for all your help