Missing parameters and priors

nerpa · June 27, 2020, 9:50am

Hi all! I am trying to estimate different amounts of missing parameters for different subjects. Since there is not an easy way to work with ragged arrays in Stan, I tried to be creative and came up with the following solution:

data {
    int T;                                        // Number of time points
    int N;                                       // Number of subjects
    int<lower=1> T_obs[N];          // Number of time points WITH data
    int<lower=0> T_mis[N];          // Number of time points WITHOUT data
    int<lower=0> idx_obs[N,T];    // Indices of time points WITH data
    int<lower=0> idx_mis[N,T];    // Indices of time points WITHOUT data  
} 
parameters {  
    // missed + observed 
    vector[T] theta_obs[N];  
    vector[T] theta_mis[N];      
}
transformed parameters {
    vector[T] theta_pr[N];   
    vector[T] theta[N];

    // Assign missing and observed data
    for (n in 1:N) { 
        theta_pr[n, idx_obs[n,1:T_obs[n]]] = theta_obs[n,1:T_obs[n]];
        if (T_mis[n] != 0) {
            theta_pr[n, idx_mis[n,1:T_mis[n]]] = theta_mis[n,1:T_mis[n]];
        } 
    }

    for (i in 1:N) {
        theta[i] = exp(theta_pr[i]);                             
  }

model {
    for (n in 1:N) {
        for (t in 1:t) {
            theta_pr[n,t] ~ normal(mu, sigma);  
            y[n,t] ~ bernoulli(theta);
}

I don’t put any priors on theta_obs and theta_mis, and I sample only theta_pr and theta. Is it kosher to use theta_mis and theta_obs only for assignment??

andre.pfeuffer · June 27, 2020, 11:36am

Sure you can do that.
Besides that, parameters are sampled.

nerpa · June 27, 2020, 11:57am

Ok, thanks for validation!

When the model compiles, I get the following warning:

Left-hand side of sampling statement (~) may contain a non-linear transform of a parameter or local variable.
If it does, you need to include a target += statement with the log absolute determinant of the Jacobian of the transform.
Left-hand-side of sampling statement:
theta_pr[[n, t]] ~ normal(...)

Can I really ignore it if there is no non-linear transform?

Finally, while I get reasonable outputs for theta_pr and theta, theta_mis and theta_obs give me very large Rhat… Any ides why/how to solve this?

andre.pfeuffer · June 27, 2020, 12:06pm

What it’s the use for theta? It’s just there, but never used.
If you want to put priors, then on parameters not transformed parameters.

Think about transformed parameters as helping temporary calculations done with parameters and data
and it’s values are reported.

nerpa · June 27, 2020, 12:21pm

I estimate theta with a bernoulli function from behavioral task data and I build a linear Gaussian model over the estimated theta parameters - because of some missing data, the missing theta values must be estimated with the linear Gaussian model
I re-parametrize theta to be the exp of theta_pr - theta = exp(theta_pr) in the transformed parameters block.
Is it completely wrong to put priors on transformed parameters? When I try to declare theta_pr in the parameters block and then assign missing values in the transformed parameters block, I get an error

Cannot assign to variable outside of declaration block; left-hand-side variable origin=parameter
Illegal statement beginning with non-void expression parsed as
  theta_pr[n, idx_obs[n, 1:T_obs[n]]]
Not a legal assignment, sampling, or function statement.  Note that
  * Assignment statements only allow variables (with optional indexes) on the left;
  * Sampling statements allow arbitrary value-denoting expressions on the left.
  * Functions used as statements must be declared to have void returns

I get the same error if I assign missing data in the parameters block :(

mike-lawrence · June 27, 2020, 12:36pm

This implies that you believe that all values from -Inf to Inf are equally credible for at least the missing values (theta_obs might be constained by other parts of the model), which doesn’t seem realistic.

Can you post the full model? I’m having trouble following what you’re trying to achieve here.

nerpa · June 27, 2020, 12:36pm

I updated my first post to include the model block.

mike-lawrence · June 27, 2020, 12:37pm

Can you update the data block too? For example, y is missing.

andre.pfeuffer · June 27, 2020, 12:37pm

It’s not wrong. But if you sample from a transformed parameter and the transformation is a non-linear transformation, then you need to add a jacobian adjustment.

You cannot assign something in parameters block. This block is only for declaring parameters.

]]] = are closed, but only two open ones [. Should be two?

nerpa · June 27, 2020, 12:40pm

There are many things missing :) the full code is REALLY long, so I tried to keep only the essentials.

nerpa · June 27, 2020, 12:43pm

But that’s the point - i don’t even estimate full theta_obs because of missing data and I use theta_obs (and theta_mis) only to create equal size arrays that I can assign from to theta_pr (obs+mis).
So is it right to put priors on parameters that I am not even estimating?

mike-lawrence · June 27, 2020, 12:43pm

Right, and I appreciate that this is often best practice for helping volunteers get to the core of the question you’re asking. In this case, I suspect you’re not understanding how to do missing data imputation correctly and want to check more thoroughly so I can possibly prove myself wrong or give you better advice if right.

nerpa · June 27, 2020, 12:45pm

Am I missing something…? It looks correct to my eyes. Thanks for the other answers!

nerpa · June 27, 2020, 1:06pm

Here: (and thanks!)

data {
    int T;                              // Number of time points
    int N;                              // Number of subjects
    int Xdim
    int<lower=1> T_obs[N];              // Number of time points WITH data
    int<lower=0> T_mis[N];              // Number of time points WITHOUT data
    int<lower=0> T_obs[N,T];            // Indices of time points WITH data
    int<lower=0> T_mis[N,T];            // Indices of time points WITHOUT data    
    int<lower=1> P;                  // Number of parameters
    int<lower=0> T_max;                 // Max number of trials across subjects across time points
    int<lower=0> Tr[N, T];              // Number of trials for each subj for each time points
    int y[N, T, T_max];                             
}

transformed data {
    vector[Xdim] Q; 
    Q = rep_vector(1.0, Xdim);
}

parameters {
    vector[Xdim] mu_prior_x;                      
    vector<lower=0>[Xdim] sigma_x;           
    real<lower=0> sigma_r;
    
    matrix[Xdim, Xdim] A;                          
    matrix[P, Xdim] C;   

    vector[Xdim] X[N,T];

    // missed + observed 
    vector[T] theta_obs[N];
    vector[T] theta_mis[N];
}

transformed parameters {
    vector[T] theta_pr[N];   
    vector[T] theta[N];

    for (n in 1:N) {
        theta_pr[n, idx_obs[n,1:T_obs[n]]] = theta_obs[n,1:T_obs[n]];
        if (T_mis[n] != 0) {
            theta_pr[n, idx_mis[n,1:T_mis[n]]] = theta_mis[n,1:T_mis[n]];
        } 
    } 

    for (i in 1:N) {
        theta[i] = exp(theta_pr[i]);                             
  }

}

model {
    sigma_v ~ cauchy(0,5); 
    sigma_x ~ cauchy(0,5); 
    mu_prior_x ~ normal(0,sigma_x);
    sigma_r ~ cauchy(0,5);   
    
    for (p1 in 1:Xdim) {
        A[p1,] ~ cauchy(0,5);
    }
    for (p1 in 1:P) {
        C[p1,] ~ cauchy(0,5);
    }        

    for (n in 1:N) {
        for (t in 1:T) {                    
                        
            if (t == 1) {                                                                                   
                X[n,t] ~ normal(mu_prior_x,sigma_v);                           
            }              
            else {                         
                X[n, t] ~  normal((A * X[n,t-1]), Q);                                          
            }  
            
            theta_pr[n,t] ~ normal(C*X[n,t],sigma_r);         

            if (idx_obs[n,t] != 0) {

                for (t1 in 1:Tr_nc[n,t]) {
                    y[n,t,t1] ~ bernoulli(theta[n,t]);    
                }   
            }
                                
        }   
    }  
}          

"""

mike-lawrence · June 27, 2020, 1:56pm

Thanks! I hate to be a pain, but one more thing will help me understand your aims and decide whether I’m on the right track in understanding how to advise: can you post a model that represents what you’d do if you didn’t have any missing data to deal with?

nerpa · June 27, 2020, 2:00pm

This would be it:

data {
    int T;                              // Number of time points
    int N;                              // Number of subjects
    int Xdim;
    int<lower=1> P;                  // Number of parameters
    int<lower=0> T_max;                 // Max number of trials across subjects across time points
    int<lower=0> Tr[N, T];              // Number of trials for each subj for each time points
    int y[N, T, T_max];                             
}

transformed data {
    vector[Xdim] Q; 
    Q = rep_vector(1.0, Xdim);
}

parameters {
    vector[Xdim] mu_prior_x;                      
    vector<lower=0>[Xdim] sigma_x;           
    real<lower=0> sigma_r;
    vector[T] theta_pr[N];   
    
    matrix[Xdim, Xdim] A;                          
    matrix[P, Xdim] C;   

    vector[Xdim] X[N,T];
}

transformed parameters {
    vector[T] theta[N];

    for (i in 1:N) {
        theta[i] = exp(theta_pr[i]);                             
  }
}

model {
    sigma_v ~ cauchy(0,5); 
    sigma_x ~ cauchy(0,5); 
    mu_prior_x ~ normal(0,sigma_x);
    sigma_r ~ cauchy(0,5);   
    
    for (p1 in 1:Xdim) {
        A[p1,] ~ cauchy(0,5);
    }
    for (p1 in 1:P) {
        C[p1,] ~ cauchy(0,5);
    }        

    for (n in 1:N) {
        for (t in 1:T) {                    
                        
            if (t == 1) {                                                                                   
                X[n,t] ~ normal(mu_prior_x,sigma_v);                           
            }              
            else {                         
                X[n, t] ~  normal((A * X[n,t-1]), Q);                                          
            }  
            
            theta_pr[n,t] ~ normal(C*X[n,t],sigma_r);         

                for (t1 in 1:Tr_nc[n,t]) {
                    y[n,t,t1] ~ bernoulli(theta[n,t]);    
                }                                
        }   
    }  
}          

"""

mike-lawrence · June 27, 2020, 2:13pm

Awesome, that helps. So, I could be still misunderstanding your model, but I actually don’t think you have any missingness to necessitate inference on at all. Data-wise, you seem to have multiple subjects each of which measured possibly multiple times at each of multiple time points, but with some subjects missing data for some time points. Data are 1s/0s so modelled as bernoulli outcomes with some auto-regressive stuff for the influence of time nested within each participant. Since the data are ragged (i.e. there’s NAs in some entries in y thanks to the some participants not having fully T_max trials per timepoint), you have what looks to be the beginnings of handling that data structure, but you haven’t actually completed that part yet (or perhaps you have, you just have Tr_nc where you should have Tr in the model block I think). Have you literally run the no-missingness-imputation model and encountered an issue? Because as it is written, even if a participant is missing data for a given timepoint entirely, the model still expresses a theta_pr for that participant-timepoint combination, and non-missing data plus your priors and the structure of your model will constrain that value.

nerpa · June 27, 2020, 2:33pm

You understood almost everything correctly! The no-missingness model runs fine; so does the missingness model. In the model with missing data I was not sure if to put priors on theta_obs and theta_mis, because I only use these parameters only for assignment to theta_pr (mis+obs).

mike-lawrence · June 27, 2020, 2:35pm

I’m saying you don’t need a model with theta_obs or theta_mis in it at all. Indeed, adding those parameters unnecessarily in the way you have can only introduce error.

nerpa · June 27, 2020, 2:39pm

Now I am confused. How would I assign missing parameters? Imagine one subject has times points [1 2 3 5] with 4 missing. I must impute the 4th time point.

Topic		Replies	Views
Missing response model (section 10.3 of Stan manual) Modeling	11	2448	May 24, 2017
Sampling efficiency when using a ragged array in a simplex-like way Modeling	6	919	July 19, 2017
How to circumvent defining a integer array in transformed parameter block Modeling specification , ecology , capture-recapture	3	4564	March 7, 2018
Missing data imputation in Stan General	3	1768	November 7, 2020
Time series with missing values that need truncated imputation Modeling	0	329	April 19, 2022

Missing parameters and priors

Related topics