Hi all,
I want to fit a state space model of positive outcomes. However, I am having trouble in the data pre-processing and subsequent model specification. The outcomes differ vastly on scale so I feel that it is best to standardise the data. From Standarizing predictors and outputs I know that I should subtract the mean and divide by the standard deviation (or by the difference between the max and the mean), however, this would imply that the values will be out of the support of a likelihood.
What is the most principled approach to follow in here? only dividing by the standard deviation? changing the problem definition? changing the likelihood of the observations? Any pointer to references will be highly appreciated.
Best,
Sergio
Probably depends on more specifics.
What are the differences in scales?
What are the outcome distributions you’re considering?
@bbbales2 Thank you for your answer.
-
Scales: There is a clear difference in the scales of the single time series. To give you an example, one individual series has minimum value 0 and maximum value 30 while other has minimum value 10 and maximum value 700.
-
Model: My model is a hidden markov model where the latent space is AR(1) and the mean of the observations is a matrix multiplication between the latent space and some parameters specific to the individual series.
-
Likelihood: For the likelihood I was thinking of a Gamma, but probably this is more of an ignorant guess.
-
Code: I coded the model like this. It includes some code for missing values. It still has the normal likelihood for the observations but I have standardised my data (without subtracting the mean)
data {
// Data matrix
int<lower = 0> N_row;
int<lower = 0> N_col;
real y_obs[N_row, N_col];
// Missing data
int<lower = 0> N_miss;
int<lower = 0> idx_row_miss[N_miss];
int<lower = 0> idx_col_miss[N_miss];
// Model parameters
int<lower = 0> size_z; // The size of the latent space
}
parameters{
// Transition
vector[size_z] alpha; // Parameters from latent to latent
vector<lower=0>[size_z] tau;
cholesky_factor_corr[size_z] L_omega;
// Emission
real<lower=0> sigma[N_col]; // Variance of output
vector[size_z] beta_c[N_col]; // Parameters from latent space to observational space
real beta_i[N_col]; // Intercept from latent space to observational space
// States and missing values
matrix[N_row, size_z] z; // Hidden states
real y_mis[N_miss]; // Missing values
}
transformed parameters{
matrix[N_row,N_col] y; // Data matrix
matrix[size_z,size_z] L_sigma; // Lower triangular matrix
// Observational and missing data
for (r in 1:N_row){
for (c in 1:N_col){
for (i in 1:N_miss)
{
if (r == idx_row_miss[i] && c == idx_col_miss[i])
{
y[r,c] = y_mis[i];
break;
}
else
{
y[r,c] = y_obs[r,c];
}
}
}
}
// LKJ correlation matrix
L_sigma = diag_pre_multiply(tau, L_omega);
}
model{
// Priors on transition
alpha ~ normal(0, 5);
tau ~ cauchy(0, 1);
L_omega ~ lkj_corr_cholesky(1.); // prior on the covariance
// Priors on emission
for (i in 1:size_z)
{
beta_c[i] ~ normal(rep_vector(0, size_z), rep_vector(5, size_z));
}
beta_i ~ normal(0, 5);
sigma ~ cauchy(0, 3);
// Priors on initial values
z[1,:] ~ multi_normal_cholesky(rep_vector(0, size_z), L_sigma);
// Model
// Latent space
for (t in 2:N_row)
{
z[t,:] ~ multi_normal_cholesky(z[t-1,:] .*alpha', L_sigma);
}
// Observational space
for (c in 1:N_col)
{
y[:,c] ~ normal(beta_i[c] + z*beta_c[c,:], sigma[c]);
}
}