# Irregularly spaced intervals in dynamic panel models

Hia,

I’ve been playing with @rtrangucci’s model from: Dynamic panel data models with Stan? but have not been able to apply it to a dataset with irregular intervals.

My initial attempt at specifying latent parameters between observations fails spectacularly as the number of missing observations increases. Does anyone have a better suggestion?

Much appreciated,
Andrew

Simulated dataset:

``````library(tidyverse)
library(rstan)

N_fields <- 3
sigma_p <- 3
p <- rnorm(N_fields, 0, sigma_p)

N_plots <- 235
pop <- sample(N_fields, N_plots, replace = T)
sigma_u = 0.1
u <- rnorm(N_plots, p[pop], sigma_u)

N_obs = 35
N = N_plots * N_obs
N_miss = 20               # change as needed
miss = sample(1:N_obs, N_miss)

N_trt = 7
trt = sample(N_trt, N_plots, replace = T)
beta = rnorm(N_trt, 0, 1)
beta[1] = 0
beta[3] = -4

delta = 0.9
sigma_e = .5

y = matrix(ncol = N_plots, nrow = N_obs)
y[1, ] = beta[trt] + u + rnorm(N_plots) * sigma_e / sqrt(1 - delta^2)

for(i in 2:N_obs) {
mu = beta[trt] + u
y[i, ] = rnorm(N_plots, delta * y[i-1, ] + mu, sigma_e)
}

x <- as.data.frame(y) %>%
rowid_to_column("t") %>%
gather(p, y, -t) %>%
mutate(p = as.numeric(as.factor(p)),
trt = rep(trt, each = N_obs),
pop = rep(pop, each = N_obs),
mis = if_else(t %in% miss, 1, 0),
y = if_else(mis == 1, 0, y)) %>%
arrange(t, p)

ggplot(x, aes(x = t, y = y, group = p)) +
geom_line() +
facet_grid(pop ~ trt)

data_list <- list(
N = N,
N_mis = sum(x\$mis),
N_trt = N_trt,
N_grp = 1,
N_plt = N_plots,
N_pop = N_fields,
y_obs = x\$y,
missing = x\$mis,
trt = x\$trt,
plt = x\$p,
pop = x\$pop
)

str(data_list)
``````

Stan model:

``````data {
int<lower=1> N;       // total
int<lower=0> N_mis;   // missing
int<lower=1> N_grp;   // types
int<lower=1> N_trt;   // contrasts
int<lower=1> N_plt;   // experiments
int<lower=1> N_pop;   // populations

vector[N] y_obs;      // abundance
vector[N] missing;    // indicator

int<lower=1, upper=N_trt> trt[N];
int<lower=1, upper=N_plt> plt[N];
int<lower=1, upper=N_pop> pop[N];
}
transformed data{
// lagged response + indices
int N_ini = N_grp * N_plt;
int N_ip1 = N_ini + 1;
int Nm_ini = N - N_ini;
vector[Nm_ini] y_lag = y_obs[1:Nm_ini];
}
parameters{
// missing data
vector[N_mis] y_mis;

// predictors
vector[N_trt - 1] beta;
real mu_b;
real<lower=0> sigma_b;

// correlation
real<lower=0, upper=1> delta_raw;

// random effects
vector[N_pop] p;
vector[N_plt] u_raw;
real<lower=0> sigma_u;

// obs variance
real<lower=0> sigma_e;
}
transformed parameters{
real delta;
vector[N_plt] u;
vector[N] mu;

// non-centered parameters
delta = delta_raw * 2 - 1;
u = u_raw * sigma_u;

{
vector[N_trt] beta_ref = append_row(0.0, beta);
mu = p[pop] + beta_ref[trt] + u[plt];
}
}
model{
real om_dsq = 1 - delta^2;
vector[N] y;

int m = 1;
for(i in 1:N) {
if(missing[i] == 1){
y[i] = y_mis[m];
m = m + 1;
} else {
y[i] = y_obs[i];
}
}

// initial observations
y[1:N_ini] ~ normal(mu[1:N_ini], sigma_e / sqrt(om_dsq));

// remainder include lag term
y[N_ip1:N] ~ normal(delta * y_lag + mu[N_ip1:N], sigma_e);

// priors
beta ~ normal(mu_b, sigma_b);
mu_b ~ normal(0, 1);
sigma_b ~ normal(0, 1);

delta_raw ~ beta(4, 4);

p ~ normal(0, 5);
u_raw ~ normal(0, 1);
sigma_u ~ normal(0, 1);
sigma_e ~ normal(0, 1);
}
``````

I wrote the R package ctsem for this kind of thing. It sets up this kind of model using Stan and has a variety of fitting approaches. For the super fast intro see the examples section of the ctStanFit function, or here’s some related works:

Hierarchical Bayesian continuous time system formulation
https://www.researchgate.net/publication/324093594_Hierarchical_Bayesian_Continuous_Time_Dynamic_Modeling

New software paper – discussing Bayesian approach only but can be adjusted for frequentist / non-linear, though I still have to write about that :)
https://www.researchgate.net/publication/310747987_Introduction_to_Hierarchical_Continuous_Time_Dynamic_Modelling_With_ctsem

Original software paper – SEM oriented, frequentist mixed effects

1 Like

Thanks Charles, it’s a very impressive package! It ran a little slow on my simple model but I’ve found another solution from the econometrics literature.

(Millmet & McDonough, 2013; https://doi.org/10.1002/jae.2548) describe a method of quasi-differencing which weighs the contribution of each parameter by the number of time-steps between observations.

Because my model is primarily random-effects I chucked out their imputation of covariates and backwards-mean instrument (without fully understanding what it was) and was surprised to see it converged nicely and recovered the parameters well, except for some bias in two of the standard deviations.

Simplified version of eqn. 18 below. One down-side is that I also discarded the first observation. Appreciate any suggestions on how I might get around this.

``````# Data simulation from above
# ...

# Calculate gaps between observations
z <- filter(x, mis == 0) %>%
group_by(p) %>%
mutate(m = dense_rank(t),
g_m = t - lag(t, default = 1),
y_lag = lag(y, default = 0)) %>%
filter(m >= 2)

# Run model with irregular observations
data_list <- list(
N = nrow(z),
N_trt = max(z\$trt),
N_grp = 1,
N_plt = max(z\$p),
N_pop = max(z\$pop),
y_obs = z\$y,
y_lag = z\$y_lag,
g_m = z\$g_m,
trt = z\$trt,
plt = z\$p,
pop = z\$pop
)

str(data_list)
``````
``````data {
int<lower=1> N;       // total
int<lower=1> N_grp;   // types
int<lower=1> N_trt;   // contrasts
int<lower=1> N_plt;   // experiments
int<lower=1> N_pop;   // populations

vector[N] y_obs;      // abundance
vector[N] y_lag;      // lagged obs.
vector[N] g_m;        // time between obs.

int<lower=1, upper=N_trt> trt[N];
int<lower=1, upper=N_plt> plt[N];
int<lower=1, upper=N_pop> pop[N];
}
parameters{
// predictors
vector[N_trt - 1] beta;
real mu_b;
real<lower=0> sigma_b;

// correlation
real<lower=0, upper=1> delta_raw;

// random effects
vector[N_pop] p;
vector[N_plt] u_raw;
real<lower=0> sigma_u;

// obs. error
real<lower=0> sigma_e;
}
transformed parameters{
real delta;
vector[N_plt] u;
vector[N] mu;

// non-centered parameters
delta = delta_raw * 2 - 1;
u = u_raw * sigma_u;

{
vector[N_trt] beta_ref = append_row(0.0, beta);
mu = p[pop] + beta_ref[trt] + u[plt];
}
}
model{
{
vector[N] delta_gm = exp(log(delta) * g_m);
vector[N] gm_adj = (1 - delta_gm) ./ (1 - delta);

}

// priors
beta ~ normal(mu_b, sigma_b);
mu_b ~ normal(0, 1);
sigma_b ~ normal(0, 1);

delta_raw ~ beta(4, 4);

p ~ normal(0, 5);
u_raw ~ normal(0, 1);
sigma_u ~ normal(0, 1);
sigma_e ~ normal(0, 1);
}

``````

Just had a quick look but yeah, once the data are discretized appropriately standard missing data approaches also work well. Sometimes this discretization leads to huge amounts of missingness, and sometimes it is preferable to estimate the continuous time generative parameters directly (rather than the effects at some discrete time lag). Re speed of ctsem – yeah I should probably change the defaults to the more typical mixed effects rather than fully random effects, not sure if this was an issue depends how deep you looked into it… good that you’re on track anyway :)