Hi all! I am trying to estimate different amounts of missing parameters for different subjects. Since there is not an easy way to work with ragged arrays in Stan, I tried to be creative and came up with the following solution:
data {
int T; // Number of time points
int N; // Number of subjects
int<lower=1> T_obs[N]; // Number of time points WITH data
int<lower=0> T_mis[N]; // Number of time points WITHOUT data
int<lower=0> idx_obs[N,T]; // Indices of time points WITH data
int<lower=0> idx_mis[N,T]; // Indices of time points WITHOUT data
}
parameters {
// missed + observed
vector[T] theta_obs[N];
vector[T] theta_mis[N];
}
transformed parameters {
vector[T] theta_pr[N];
vector[T] theta[N];
// Assign missing and observed data
for (n in 1:N) {
theta_pr[n, idx_obs[n,1:T_obs[n]]] = theta_obs[n,1:T_obs[n]];
if (T_mis[n] != 0) {
theta_pr[n, idx_mis[n,1:T_mis[n]]] = theta_mis[n,1:T_mis[n]];
}
}
for (i in 1:N) {
theta[i] = exp(theta_pr[i]);
}
model {
for (n in 1:N) {
for (t in 1:t) {
theta_pr[n,t] ~ normal(mu, sigma);
y[n,t] ~ bernoulli(theta);
}
I don’t put any priors on theta_obs and theta_mis, and I sample only theta_pr and theta. Is it kosher to use theta_mis and theta_obs only for assignment??
When the model compiles, I get the following warning:
Left-hand side of sampling statement (~) may contain a non-linear transform of a parameter or local variable.
If it does, you need to include a target += statement with the log absolute determinant of the Jacobian of the transform.
Left-hand-side of sampling statement:
theta_pr[[n, t]] ~ normal(...)
Can I really ignore it if there is no non-linear transform?
Finally, while I get reasonable outputs for theta_pr and theta, theta_mis and theta_obs give me very large Rhat… Any ides why/how to solve this?
I estimate theta with a bernoulli function from behavioral task data and I build a linear Gaussian model over the estimated theta parameters - because of some missing data, the missing theta values must be estimated with the linear Gaussian model
I re-parametrize theta to be the exp of theta_pr - theta = exp(theta_pr) in the transformed parameters block.
Is it completely wrong to put priors on transformed parameters? When I try to declare theta_pr in the parameters block and then assign missing values in the transformed parameters block, I get an error
Cannot assign to variable outside of declaration block; left-hand-side variable origin=parameter
Illegal statement beginning with non-void expression parsed as
theta_pr[n, idx_obs[n, 1:T_obs[n]]]
Not a legal assignment, sampling, or function statement. Note that
* Assignment statements only allow variables (with optional indexes) on the left;
* Sampling statements allow arbitrary value-denoting expressions on the left.
* Functions used as statements must be declared to have void returns
I get the same error if I assign missing data in the parameters block :(
This implies that you believe that all values from -Inf to Inf are equally credible for at least the missing values (theta_obs might be constained by other parts of the model), which doesn’t seem realistic.
Can you post the full model? I’m having trouble following what you’re trying to achieve here.
It’s not wrong. But if you sample from a transformed parameter and the transformation is a non-linear transformation, then you need to add a jacobian adjustment.
You cannot assign something in parameters block. This block is only for declaring parameters.
]]] = are closed, but only two open ones [. Should be two?
But that’s the point - i don’t even estimate full theta_obs because of missing data and I use theta_obs (and theta_mis) only to create equal size arrays that I can assign from to theta_pr (obs+mis).
So is it right to put priors on parameters that I am not even estimating?
Right, and I appreciate that this is often best practice for helping volunteers get to the core of the question you’re asking. In this case, I suspect you’re not understanding how to do missing data imputation correctly and want to check more thoroughly so I can possibly prove myself wrong or give you better advice if right.
data {
int T; // Number of time points
int N; // Number of subjects
int Xdim
int<lower=1> T_obs[N]; // Number of time points WITH data
int<lower=0> T_mis[N]; // Number of time points WITHOUT data
int<lower=0> T_obs[N,T]; // Indices of time points WITH data
int<lower=0> T_mis[N,T]; // Indices of time points WITHOUT data
int<lower=1> P; // Number of parameters
int<lower=0> T_max; // Max number of trials across subjects across time points
int<lower=0> Tr[N, T]; // Number of trials for each subj for each time points
int y[N, T, T_max];
}
transformed data {
vector[Xdim] Q;
Q = rep_vector(1.0, Xdim);
}
parameters {
vector[Xdim] mu_prior_x;
vector<lower=0>[Xdim] sigma_x;
real<lower=0> sigma_r;
matrix[Xdim, Xdim] A;
matrix[P, Xdim] C;
vector[Xdim] X[N,T];
// missed + observed
vector[T] theta_obs[N];
vector[T] theta_mis[N];
}
transformed parameters {
vector[T] theta_pr[N];
vector[T] theta[N];
for (n in 1:N) {
theta_pr[n, idx_obs[n,1:T_obs[n]]] = theta_obs[n,1:T_obs[n]];
if (T_mis[n] != 0) {
theta_pr[n, idx_mis[n,1:T_mis[n]]] = theta_mis[n,1:T_mis[n]];
}
}
for (i in 1:N) {
theta[i] = exp(theta_pr[i]);
}
}
model {
sigma_v ~ cauchy(0,5);
sigma_x ~ cauchy(0,5);
mu_prior_x ~ normal(0,sigma_x);
sigma_r ~ cauchy(0,5);
for (p1 in 1:Xdim) {
A[p1,] ~ cauchy(0,5);
}
for (p1 in 1:P) {
C[p1,] ~ cauchy(0,5);
}
for (n in 1:N) {
for (t in 1:T) {
if (t == 1) {
X[n,t] ~ normal(mu_prior_x,sigma_v);
}
else {
X[n, t] ~ normal((A * X[n,t-1]), Q);
}
theta_pr[n,t] ~ normal(C*X[n,t],sigma_r);
if (idx_obs[n,t] != 0) {
for (t1 in 1:Tr_nc[n,t]) {
y[n,t,t1] ~ bernoulli(theta[n,t]);
}
}
}
}
}
"""
Thanks! I hate to be a pain, but one more thing will help me understand your aims and decide whether I’m on the right track in understanding how to advise: can you post a model that represents what you’d do if you didn’t have any missing data to deal with?
data {
int T; // Number of time points
int N; // Number of subjects
int Xdim;
int<lower=1> P; // Number of parameters
int<lower=0> T_max; // Max number of trials across subjects across time points
int<lower=0> Tr[N, T]; // Number of trials for each subj for each time points
int y[N, T, T_max];
}
transformed data {
vector[Xdim] Q;
Q = rep_vector(1.0, Xdim);
}
parameters {
vector[Xdim] mu_prior_x;
vector<lower=0>[Xdim] sigma_x;
real<lower=0> sigma_r;
vector[T] theta_pr[N];
matrix[Xdim, Xdim] A;
matrix[P, Xdim] C;
vector[Xdim] X[N,T];
}
transformed parameters {
vector[T] theta[N];
for (i in 1:N) {
theta[i] = exp(theta_pr[i]);
}
}
model {
sigma_v ~ cauchy(0,5);
sigma_x ~ cauchy(0,5);
mu_prior_x ~ normal(0,sigma_x);
sigma_r ~ cauchy(0,5);
for (p1 in 1:Xdim) {
A[p1,] ~ cauchy(0,5);
}
for (p1 in 1:P) {
C[p1,] ~ cauchy(0,5);
}
for (n in 1:N) {
for (t in 1:T) {
if (t == 1) {
X[n,t] ~ normal(mu_prior_x,sigma_v);
}
else {
X[n, t] ~ normal((A * X[n,t-1]), Q);
}
theta_pr[n,t] ~ normal(C*X[n,t],sigma_r);
for (t1 in 1:Tr_nc[n,t]) {
y[n,t,t1] ~ bernoulli(theta[n,t]);
}
}
}
}
"""
Awesome, that helps. So, I could be still misunderstanding your model, but I actually don’t think you have any missingness to necessitate inference on at all. Data-wise, you seem to have multiple subjects each of which measured possibly multiple times at each of multiple time points, but with some subjects missing data for some time points. Data are 1s/0s so modelled as bernoulli outcomes with some auto-regressive stuff for the influence of time nested within each participant. Since the data are ragged (i.e. there’s NAs in some entries in y thanks to the some participants not having fully T_max trials per timepoint), you have what looks to be the beginnings of handling that data structure, but you haven’t actually completed that part yet (or perhaps you have, you just have Tr_nc where you should have Tr in the model block I think). Have you literally run the no-missingness-imputation model and encountered an issue? Because as it is written, even if a participant is missing data for a given timepoint entirely, the model still expresses a theta_pr for that participant-timepoint combination, and non-missing data plus your priors and the structure of your model will constrain that value.
You understood almost everything correctly! The no-missingness model runs fine; so does the missingness model. In the model with missing data I was not sure if to put priors on theta_obs and theta_mis, because I only use these parameters only for assignment to theta_pr (mis+obs).
I’m saying you don’t need a model with theta_obs or theta_mis in it at all. Indeed, adding those parameters unnecessarily in the way you have can only introduce error.
Now I am confused. How would I assign missing parameters? Imagine one subject has times points [1 2 3 5] with 4 missing. I must impute the 4th time point.