Simple CAR model and log-transform

ALang · July 5, 2018, 1:35pm

Hello Stan world,

I am working on tree disease. My current research consist in extending an existing hierarchical model to predict disease propagation within tree plantations. The previous model was implemented in INLA and included among other things a Conditionally AutoRegressive (CAR) component. To get my hands on Stan and tame it with a simple problem, I decided first to simulate a simple CAR process and model it in Stan to see if I can find back the values of the parameters I set myself, and how sensitive the model was to different priors. However things did not work as expected and I am having trouble with the reparametrization of the model.

The model is as follows:
y \sim N(X \beta + \phi, \sigma^2_\epsilon )
\beta \sim N(0, 4)
X in that case was just vector of size n, the number of obersvation, of random draws from N \sim (2,2)

\phi is the CAR component depending on two main parameters \alpha \in [0 \;;1] the spatial dependency and \tau the precision (\tau = 1 / \sigma). Cf Max Joseph’s tutorial.
\phi \sim N(0, [\tau ( D-\alpha W ) ]^-1)
\alpha \sim N(0.5, 0.5)
\tau \sim \gamma(2,2)

And the data was simulated using the following code:

## Set up a square lattice region
# lattice side length
m <- 10
# number of locations
n <- m^2
x.easting <- 1:m
x.northing <- 1:m
Grid <- expand.grid(x.easting, x.northing)

## Set up distance and neighbourhood (W, based on sharing a common border) matrices
distance <-array(0, c(n,n))
W <-array(0, c(n,n))
distance <- as.matrix(dist(Grid))
W[distance == 1] <- 1
## Calculate D matrix from W 
D <- diag(rowSums(W), n, n)

## Simulate phi
# Define CAR parameters
alpha <- 0.5
tau <- 2
# Define precision and covariance matrices
SIG_inv <- tau * (D - alpha * W) # Precision matrix
SIG <- solve(SIG_inv) # Covariance matrix
phi <- mvrnorm(n = 1, mu = rep(0, n), Sigma = SIG) # Spatial effect using CAR model
beta <- 2.5 # Coefficient
x <- rnorm(n, 2, 2) # Predictor
se2 <- 2 # residual var
e <- rnorm(n, 0, sqrt(se2)) # residuals

ynorm <- beta*x + phi + e

And then the Stan code

data {
	int<lower = 1> n; // Nb of locations
	vector[n] x; // predictor
        vector[n] y; // response
	matrix<lower = 0, upper = 1>[n,n] W;// Adjacency matrix
}
transformed data{
	vector[n] zeros;
	matrix<lower = 0>[n, n] D;
	{
	vector[n] W_rowsums;
	for (i in 1:n) {
	  W_rowsums[i] = sum(W[i, ]);
	}
	D = diag_matrix(W_rowsums);
	}
	zeros = rep_vector(0, n);
}
parameters{
  real beta;
	real<lower = 0> sigmaepsilon;
	vector[n] phi;
	real<lower = 0> tau;
	real<lower = 0, upper = 1> alpha;
}
model{
  beta ~ normal(0,4);
	phi ~ multi_normal_prec(zeros, tau*(D - alpha*W));
	alpha ~ normal(0.5, 0.5);
	tau ~ gamma(2, 1);
	y ~ normal(x * beta + phi, sigmaepsilon);
}

## Fitting the stan model
dat <- list(n = n,  x = x, y = ynorm, W = W, D = D)
niter <- 2E3
nchains <- 4
system.time(fit <- stan('simu_CAR_norm.stan', data = dat, 
                              iter = niter, chains = nchains, verbose = FALSE))

I did not expect any problem as I was first restraining myself to an ideal simple case, but I kept running into a low BFMI problem. This appears to be mainly due to \tau as it displays a strong negative correlation with energy__. (cf Brief Guide to Stan’s Warnings.). \tau is also poorly estimated. Both posetrior mean and median are around 1 where as the real value of \tau is 2.
As increasing the niter to 10.000 improved nothing, I am now trying to reparametrize my model. I found some inspiration in the INLA documentation. The latent model "generic1" is somewhat similar to a CAR model and uses a log transform on \tau \; ; \; \theta_1 = log(\tau) and logit transform on \alpha \; ; \; \theta_2 = logit(\alpha).

And that’s where I am confused. I tried first to simply log-transform \tau,
following the Stan Manual version 2.17.0 p291 :

transformed parameters{
 real<lower = 0> log_tau;
 log_tau = log(tau);
}
model{
  beta ~ normal(0,4);
  phi ~ multi_normal_prec(zeros, tau*(D - alpha*W));
  alpha ~ normal(0.5, 0.5);
  log_tau ~ gamma(2, 1);
  target += -log_tau;
  y ~ normal(x * beta + phi, sigmaepsilon);
}

I am getting an error message from the PARSER :

Warning (non-fatal):
Left-hand side of sampling statement (~) may contain a non-linear transform of a parameter  or local variable. If it does, you need to include a target += statement...

Which I, precisely, exepected to avoid, but worse than that, I am still facing a low BFMI problem, and in addition wind up with a number of divergent transitions (311 for niter = 2000). Nb : This number doesn’t decrease drastically if I increase adapt_delta to 0.90.

I then tried to replace this two lines

  log_tau ~ gamma(2,2);
  target += -log_tau;

by

  target += gamma_lpdf(log_tau | 2, 2);

Which doesn’t not improve anything, (still encountering both divergence and energy problems)

Bottom line : How should I properly log-transform \tau to solve my problem ? or is it even a sound approach to do that ?

Thanks in advance

bbbales2 · July 5, 2018, 2:06pm

Just a couple comments, have you looked through this: http://mc-stan.org/users/documentation/case-studies/icar_stan.html ? That related at all?

With regards to the log tau thing, maybe just sample log_tau directly?

parameters {
  real<lower=0.0> log_tau;
}
transformed parameters {
  real tau = exp(log_tau);
}

bbbales2 · July 5, 2018, 2:16pm

Oh, and since you have tons of divergences, could it be that you’re in funnel city with those latent variables phi?

Since you have a normal observation model, can you somehow combo these two guys together in a way that makes sense so you can avoid sampling phi?

  phi ~ multi_normal_prec(zeros, tau*(D - alpha*W));
  y ~ normal(x * beta + phi, sigmaepsilon);

bbbales2 · July 5, 2018, 2:20pm

There’s gonna be a better way of doing this, but maybe just:

parameters {
  vector[n] phi_z;
}
transformed parameters {
  vector[n] phi = phi_z / sqrt(tau);
}
model {
  phi_z ~ multi_normal_prec(zeros, D - alpha * W);
}

Might get you somewhere. You got access to a copy of “Gaussian Markov Random Fields”, Rue/Held? It’s got all the linear algebra tricks in it.

ALang · July 9, 2018, 2:44pm

Thanks for your answers Ben,

I already went through both the stan tutorials on ICAR and sparse CAR. The latter is the closer to my problem and the actual source of inspiration from my stan code. However, due to the data used in the example no reparamtrization was needed.

Regarding your pratical tricks :

I had already tried to sample log_tau directly and transforming it back without success.
Combining phi and y, is ultimately not an option as y the response needs to be binomial in the final model. I had first another issue with the binomial response and “convert” it to normal response, for the sake of my question. Now the issue is fixed but I will still need to log transform tau.
I also tried your phi_z trick but then divergences hit the ceilling (something like 3999…)

I guess my question has a broader reach in the sense that I wanted to know whether ;

log_tau ~ gamma(2, 1);
 target += -log_tau;

that very piece of code will actually allow me to sample log_tau from a log-gamma distribution. I was a bit puzzled as to the utilization of the target += statement.
Is there an actual difference between that statement

target += gamma_lpdf(log_tau | 2, 1);

and that one

log_tau ~ gamma(2, 1);
 target += -log_tau;

?

mbjoseph · July 9, 2018, 3:42pm

Hi @ALang!

For a CAR model with a Gaussian likelihood, it’s more common to use a marginal specification that places a multivariate normal distribution over y, rather than explicitly trying to estimate phi. With the CAR model as written, when \alpha approaches zero, the \phi_i terms become independent, which will make it difficult to simultaneously identify the two scale parameters (sigmaepsilon and tau).

Instead, for a Gaussian likelihood you’d probably want something more like:

y ~ multi_normal_prec(x * beta, tau * (D - alpha * W));

Here, you don’t need to have two scale parameters: tau is the only scale parameter, and depending on the value of alpha, the variance in y can be uncorrelated spatially (\alpha \rightarrow 0), or very correlated/spatially smooth (\alpha \rightarrow 1).

If your actual data are binomial observations, then you won’t be able to marginalize over \phi like this, but for Gaussian y, this marginal specification should work better.

As for your other question about ~ vs. +=, both increment the log probability of the model. Neither will literally draw samples. The main difference is ~ will drop any additive constants when incrementing the log probability, while += gamma_lpdf(...) will not.

bbbales2 · July 9, 2018, 4:03pm

There’s a rather convoluted thread on doing a Laplace approximation to latent variables like those that you might try (I think the final working code ended up in: Algebraic sovler problems (Laplace approximation in Stan) - #30 by anon75146577) if you have to go with non-normal output.

Regarding the log gamma thing, there’s a difference. Just target += gamma_lpdf(log_tau | 2, 1); does not include the -log_tau adjustment. You need the -log_tau adjustment because you’re putting a prior on a transformed variable. Section 22.3 “Change of Variables”, page 290 from the 2.17.0 manual has some good details on what this is and where it comes from.

Lol you win some you lose some

ALang · July 18, 2018, 7:50am

Thanks for your help gentlemen,

Going back to the problem, @mbjoseph’s solution does work :

 y ~ multi_normal_prec(x * beta, tau * (D - alpha * W));

It did not at first cause I did not plugged \sigma_\epsilon^2 out of the parameter block and the corresponding residuals terms out of the data.

Bob_Carpenter · July 19, 2018, 12:44am

ALang:

I guess my question has a broader reach in the sense that I wanted to know whether ;
log_tau ~ gamma(2, 1);
target += -log_tau;
that very piece of code will actually allow me to sample log_tau from a log-gamma distribution.

What’s a log-gamma distribution? The point of lognormal is to transform positive only to unconstrained. What does log-gamma do?

To answer your question, the target += statement simply increments the target density. The relevant comparison is

log_tau ~ gamma(2, 1);

and

target += gamma_lpdf(log_tau | 2, 1);

These have the same effect as far as sampling goes, but the former is able to drop some constants that get included in the calculation of gamma_lpdf, but do not affect sampling (which is only done on a density defined up to a constant).

so when you do this:

log_tau ~ gamma(2, 1);
target += -log_tau;

What you are saying is that:

log p(log_tau) = log gamma(log_tau | 2, 1) - log_tau

Now you still need to make sure log_tau is positive here, as gamma is only defined for positive variates.

Topic		Replies	Views
Sampling difficulties with hierarchical/CAR growth model with truncated likelihood Modeling	6	1014	January 18, 2018
Car models and non-centred parameterisation Modeling specification	1	83	July 15, 2024
Sparse CAR predictions for hold-out data Modeling	5	709	October 21, 2020
Paper on CAR models and spatial analysis pakage Publicity	2	645	October 6, 2021
Spatio-Temporal Model Modeling	11	3984	February 8, 2018

Simple CAR model and log-transform

Related topics