Reduce_sum and missing values

Dear all,

I have longitudinal data of the type:

vector[k] observations[n,N];

with N subjects and n observations per subject at different time. K variables are measured at any i time (the i time is the same for all the subjects).

Using the reduce_sum function, I can “slice” the computation at the subject level but only in the case of no missing data. However, some of the variables are easily measurable and I would like to fully exploit the possibility of measuring these variables more often than the other variables.

I can figure out how to implement the model without using reduce_sum, but I would like to use this function because of the very long computation time.

Any suggestion?

Thanks in advance.

m

Often with missing values one is interested in what those values might have been if non-missing, which can be achieved by adding a parameter to the model for each missing value that gets merged into the data before the likelihood is computed, which in turn would leave you with a full/no-seemingly-missing data scenario that you say you’ve already got working with reduce_sum. See here .

2 Likes

Dear Mike,

Thanks for your reply and the video. Your approach is very interesting. However, you have many samples from the same population, but I have, for a given subject and at a given time, a single measure of the observed variables.

Anyway, inspired in your suggestion, what I am looking for is something like:

// Function for reduce_sum
real partial_sum_lpdf(...) {
  // numerical integration
  vector[2] y[n] = ode_rk45_tol(...);

  
  // log-likelihood                     
  return normal_lpdf(to_vector(obs[indices_1,1]) | pow(to_vector(y[start:end,2]),(1/3.0)) , sd_1)
            +
         normal_lpdf(to_vector(obs[indices_2,2]) | to_vector(y[start:end,2]) + to_vector(y[start:end,1]) , sd_2)
            ;
}

The final goal is to estimate the parameters of the derivatives (in the ode_rk45 function). This is a very simplified, non-functional code. I have removed all the arguments of the functions for sake of simplicity. The key is that indices_1 indicates the times for which I have observations of the first observed variable (obs[indices_1,1]).

The full code compiles but does not run:
Chain 1 Unrecoverable error evaluating the log probability at the initial value.
Chain 1 Exception: Exception: normal_lpdf: Random variable has size = 3, but Location parameter has size 7; and they must be the same size. (in ‘C:/Users/palmer/AppData/Local/Temp/RtmpYpXvVm/model-32204c3535d1.stan’, line 135, column 4 to line 138, column 14) (in ‘C:/Users/palmer/AppData/Local/Temp/RtmpYpXvVm/model-32204c3535d1.stan’, line 221, column 4 to line 227, column 26)

As expected, the length of the expected values (y, the output of ode_rk45) is larger than the length of the observed values.

The same code runs when the two variables are fully observed.

I guess that redece_sum selects some specific times (=the same times for all variables), thus the strategy above may be imposible.

But, at the same time, I would like to compute the integral only once because this step is very time consuming.

m

This describes the scenario the video covers.

Mike,

Thank you!

After a more careful watching of your video, I have understood the strategy.

In my case, the prior for each one of the missing data can be assumed to be a log-normal distribution with informative priors. I have tested the trick using simulated data: The missing data and, more interestingly, the ODE level parameters are precisely and accurately estimated.

My code is still dirty, but it may suggest the path for other staners trying to use reduce_sum in a similar context. This is a non-functional code. For sake of simplicity, only the parts that are relevant for the topic are included:


functions {
  //ODE function by here
  ...
  
  // Function for reduce_sum 
  real partial_sum_lpdf(...arguments by here...) {
    // numerical integration
    vector[k] y[n] = ode_rk45_tol(...arguments by here...);                     
    
    // log-likelihood                     
    return normal_lpdf(to_vector(obs[,1]) | pow(to_vector(y[start:end,2]),(1/3.0)) , sd_1)
    +
      normal_lpdf(to_vector(obs[,2]) | to_vector(y[start:end,2]) + to_vector(y[start:end,1]) , sd_2)
    ;
  }
}

model {
  array[n] vector [2] obs;              //  array with the full (= mising + oberved) data ("obs" is the object slided by redece_sum)
  real length[n];                       //  full (= mising + observed) lengths
  real length_inputed[n_mising_length]; //  inputed (=mising) lengths
  
  // filling "obs" object
  log_length_inputed ~ normal(2.7,.2);
  length_inputed = exp(log_length_inputed);
  length[indices_non_mising_length]=length_obs;
  length[indices_mising_length]=length_inputed;
  obs[1:n,1]=length;
  obs[1:n,2]=weight;
  
  // some code has beed deleted by here (priors for the ODE paramteres)

  target += reduce_sum(partial_sum_lpdf,
                        // array to slice
                        obs,
                        // slicing grainsize
                        grainsize,
                       .... more arguments...
}
3 Likes

I’ve never imputed missing data, but can anyone enlighten me what the point of doing so in this (or any) ODE context is?

There are a few possible reasons. I believe the second bullet below corresponds to the context of this thread:

  • You want inference on the missing data values.
  • There is an elegant way to package the data/code that relies on an array of values, but some are missing. Reformatting the data to a structure that can deal with the raggedness more directly is cumbersome and possibly less efficient than just estimating the missing data and sticking with a “nice” data structure.
  • The missing data are predictors but the response is known. You believe that there’s enough information contained in non-missing predictors for the same observation that the observation can still be informative.
1 Like

Well yes, but aren’t all of these points much better handled by you estimating the ODE parameters including the corresponding population parameters? Or is there any added value in first imputing the missing data via a worse model?

I might be missing something fundamental here, but I think we’re just caught in semantic confusion over what “imputation” means. I think your definition is more formally correct, but I think here it is being used simply to mean performing model-based estimation of the missing data, fit jointly in one step.

To be honest, that’s probably me!

So just for the record, I believe the best thing you can do for ODE models is to just fill the non-observed slots with NaNs and then filter them out when computing the log likelihood.

If you have a friendly and efficient and easy way to do this filtering, then you’ve rendered the second bullet above moot. I’m much less expert than you are on ODE models, but I don’t see anything particularly special about the fact that it’s an ODE model. If I have a non-ragged dataset for which I’ve written an efficient Stan model, and then I change a handful of observations to NA, it might be easier from a coding perspective to convert those observations to parameters rather than implement the appropriate modifications on the script to avoid likelihood contributions from these observations (and then, if I desire inference on these quantities, to additionally write out the necessary generated quantities stuff).

Well for ODE models, where you usually have so many FLOPs/byte of data, you can probably manually loop over all observations and just skip them or not via is_nan without any performance penalty at all.

Things might be very different for the models you usually look at, maybe being able to exploit vectorization will be much more efficient.

1 Like

I still suspect that I’m revealing my ignorance here, but why can’t the likelihood conditional on the ODE parameters be expensive and involve a lot of data?

Well it can, you could have a costly-to-evaluate ODE for which you have a lot of measurements.

I guess what I meant is that the computation time is generally dominated by solving the ODE and is practically independent of the number of observations or whatever you do after solving the ODE (in my experience).

Will this strategy work with reduce_sum?

m

Sure (I have also never used reduce sum).

Bear in mind that you are not allowed to pass in NA into Stan as data as I recall…but you can convert NA to Inf and use that to mark NA as a workaround.

I’ve done this I believe. But I heard something similar in Mike’s video.