Stan application to crop variability / time series modeling?


#1

I’m working with some collaborators to model crop yield variability response to soil organic matter contents–yield variability should decrease with more organic matter because organic matter retains water well and can buffer drought effects.

We have yield on soils, climate, and yield for all US counties for at least 20 years.

My colleagues and I would usually tackle this by calculating a statistic–like coefficient of variation. That would then reduce the data so you’d have only one value per county.

Do any of you more experienced Stan folks have thoughts on time-series (or even ODE-based) models that might help us model the data without having to aggregate up to a one-value-per-county statistic?


#2

The way I’d approach it is to take a standard time-series model, where folks typically model an effect of the predictors on the mean, and add an effect of the predictor on the noise term as well.


#3

As you may have anticipated, what we’re likely to say is to build a proper multilevel spatio-temporal model with a latent yield (or whatever else you use to model crop yields) in each location in each year. Then the actual observations are noisy measurements of that (do they have error?).

You presumably want spatial pooling of nearby counties and temporal pooling over time. A full Gaussian process (GP) is likely to be prohibitive if you have 20 years * 2000 counties as we don’t have a good way of representing structured sparse covariance.

You can check out Mitzi’s case study on ICAR models for areal data. The ICAR model is nice in that it scales very well compared to full GPs. But I’m not sure how to add a temporal component. One option would be a separate temporal trend represented the way she deals with heterogeneous random effects in the case study.


#4

Thanks, Mike. Would that look something like the following?

 data {
  int<lower=0> N;
  int<lower=0> K;
  
  matrix[N,K] x;
  vector[N] y;
}
parameters {
  real alpha;
  vector[K] beta;
  vector[K] beta2;
  real<lower=0> sigma;
}
model {
  alpha ~ normal(0,10);
  beta ~ normal(0,10);

  y ~ normal(x*beta + alpha, x*beta2);
}

For a time series, this would have to be coded in such a way where you were separately modeling the mean and sd for each separate set of observations 1 to n, where there are n time points and each of the j samples has n observations. Do you have any advice on how to do that?


#5

Close. You’ve left sigma in, which is presumably from a previous fixed-noise model, and forgot to add an intercept to the noise term (just as you’d want an intercept for the mean term). And to keep the noise term positive, I’d model them on the log scale:

data {
  int<lower=0> N;
  int<lower=0> K;
  matrix[N,K] x;
  vector[N] y;
}
parameters {
  real alpha;    
  real alpha2;
  vector[K] beta;
  vector[K] beta2;
}
model {
  alpha ~ normal(0,10);
  beta ~ normal(0,10);
  alpha2 ~ normal(0,10);
  beta2 ~ normal(0,10);
  y ~ normal(x*beta + alpha, exp(x*beta2 + alpha2) );
}

And note that this is a strictly linear model for both the mean and noise; if you want to be able to flexibly accommodate non-linearity, check out GPs or spline models.


#6

Thanks, Mike.

In my application, I’ll have a set of counties, each with its own time series (of, say, 20 years). If I understand correctly, the way that this is coded up right now, it’s modeling the aggregate mean and sd as a function of the fixed effects. Would you then include county and year dummies in your set of fixed effects?

Steve


#7

Thanks, Bob. I’m jumping in and trying to figure out what’s going on under the hood in these ICAR models. Do you have other examples of models in the realm of multi-level spatio-temporal? Or even just temporal?


#8

Leave year in the predictor matrix x, but it would be best to start with a hierarchical model where each county has it’s own intercepts & slopes but that these deviate from the population mean intercepts/slopes according to, say, a normal distribution an sd that is a parameter of the model (one sd for the intercept for the mean, one sd for the slope of the mean, one sd for the intercept for the noise, etc). Here’s a case study on the benefits of such “partial pooling” approaches (the fact that you’re not using a binomial outcome is irrelevant). You could even capture any dependencies amongst how counties’ deviations manifest across the parameters by modelling the deviations not as independent normal but as a multivariate normal, permitting inference on the correlation matrix.


#9

Not a lot beyond what’s in the manual and in some user papers.

I think we may be doing some spatio-temporal modeling real soon. From my cursory inspection of the literature, it looks like there’s a whole lot of different ways that people mix time series and spatial models.