Data pre-processing and likelihood specification

Hi all,

I want to fit a state space model of positive outcomes. However, I am having trouble in the data pre-processing and subsequent model specification. The outcomes differ vastly on scale so I feel that it is best to standardise the data. From Standarizing predictors and outputs I know that I should subtract the mean and divide by the standard deviation (or by the difference between the max and the mean), however, this would imply that the values will be out of the support of a likelihood.

What is the most principled approach to follow in here? only dividing by the standard deviation? changing the problem definition? changing the likelihood of the observations? Any pointer to references will be highly appreciated.


Probably depends on more specifics.

What are the differences in scales?

What are the outcome distributions you’re considering?

@bbbales2 Thank you for your answer.

  1. Scales: There is a clear difference in the scales of the single time series. To give you an example, one individual series has minimum value 0 and maximum value 30 while other has minimum value 10 and maximum value 700.

  2. Model: My model is a hidden markov model where the latent space is AR(1) and the mean of the observations is a matrix multiplication between the latent space and some parameters specific to the individual series.

  3. Likelihood: For the likelihood I was thinking of a Gamma, but probably this is more of an ignorant guess.

  4. Code: I coded the model like this. It includes some code for missing values. It still has the normal likelihood for the observations but I have standardised my data (without subtracting the mean)

data {
  // Data matrix
  int<lower = 0>  N_row;
  int<lower = 0>  N_col;
  real            y_obs[N_row, N_col];

  // Missing data 
  int<lower = 0>  N_miss;
  int<lower = 0>  idx_row_miss[N_miss];
  int<lower = 0>  idx_col_miss[N_miss];

  // Model parameters
  int<lower = 0> size_z; // The size of the latent space
  // Transition
  vector[size_z]               alpha;   // Parameters from latent to latent
  vector<lower=0>[size_z]      tau;
  cholesky_factor_corr[size_z] L_omega;

  // Emission
  real<lower=0>   sigma[N_col];   // Variance of output         
  vector[size_z]  beta_c[N_col];  // Parameters from latent space to observational space
  real            beta_i[N_col];  // Intercept from latent space to observational space

  // States and missing values
  matrix[N_row, size_z] z;              // Hidden states
  real                  y_mis[N_miss];  // Missing values
transformed parameters{
  matrix[N_row,N_col] y;         // Data matrix
  matrix[size_z,size_z] L_sigma; // Lower triangular matrix

  // Observational and missing data
  for (r in 1:N_row){
      for (c in 1:N_col){
        for (i in 1:N_miss)
          if (r == idx_row_miss[i] && c == idx_col_miss[i])
            y[r,c] = y_mis[i];
            y[r,c] = y_obs[r,c];

  // LKJ correlation matrix  
  L_sigma = diag_pre_multiply(tau, L_omega);
  // Priors on transition
  alpha   ~ normal(0, 5);
  tau     ~ cauchy(0, 1);
  L_omega ~ lkj_corr_cholesky(1.);  // prior on the covariance

  // Priors on emission
  for (i in 1:size_z) 
    beta_c[i]  ~ normal(rep_vector(0, size_z), rep_vector(5, size_z));
  beta_i  ~ normal(0, 5);
  sigma ~ cauchy(0, 3);
  // Priors on initial values
  z[1,:] ~ multi_normal_cholesky(rep_vector(0, size_z), L_sigma); 
  // Model

  // Latent space
  for (t in 2:N_row)
    z[t,:] ~ multi_normal_cholesky(z[t-1,:] .*alpha', L_sigma); 

  // Observational space
  for (c in 1:N_col)
    y[:,c] ~ normal(beta_i[c] + z*beta_c[c,:], sigma[c]);