State-space model with latent variables

Hi everyone!

I tried to estimate State-space models with latent variables in r, but I failed. Does someone know how to do it better?

Let’s say we have two observable variables: gdp and inflation. We also know there are three unobservable variables: potential grow, potential gdp, output gap. We want to estimate unobservable variables and two coefficient (a1 and a2).

Our model

State Equations:

  • grow[t] = grow[t-1] + e
  • potential[t] = potential[t-1] + grow[t] + e
  • gap[t] = a1*gap[t-1] + e

Measurement Equations:

  • gdp[t] = potential[t] + gap[t]
  • inflation[t] = inflation[t-1] + a2*gap[t] + e

Here I generated data:

a2 <- 0.3

grow <- c()
potential <- c()
gap <- c()
gdp <- c()
inflation <- c()

grow[1] <- 2
potential[1] <- 10
gap[1] <- 0
gdp[1] <- potential[1] + gap[1]/100
inflation[1] <- 2

for (i in 2:100) {
  grow[i] <- grow[i-1] + rnorm(1, 0, 0.1)
  potential[i] <- potential[i-1] + grow[i]/100 + rnorm(1, 0, 0.1)
  gap[i] <- a1*gap[i-1] +  rnorm(1,0,0.1)
  gdp[i] <- potential[i] + gap[i]/100
  inflation[i] <- inflation[i-1] + a2*gap[i]   + rnorm(1,0,0.1)
}

And here is my rstan code:

  int T; // number of obs
  int P; //number of variables
  matrix[T,P] Y; //dataset of generated series
}

parameters {

  #Coefficients
  vector[1] alfa1; //ar gap
  vector[1] alfa2; //phillips curve

  #State Variables (unobserved economic variables)
  vector[T] gap; // output gap
  vector[T] potential; // potential output
  vector[T] grow; // growth of potential output

  #Innovations
  real<lower = 0> sigma_pot; // The scale of innovations to potential output
  real<lower = 0> sigma_grow; // The scale of innovations to growth in potential output
  real<lower = 0> sigma_gap; // The scale of innovations to output gap
  real<lower = 0> sigma_inf; // The scale of innovations to phillips curve
}


model {
  // priors

  //Innovations
  sigma_pot ~ cauchy(0.2,3);
  sigma_grow  ~ cauchy(0.3,3); 
  sigma_gap ~ cauchy(0.9,5); 
  sigma_inf ~ cauchy(2,5);

  //coefficients
  alfa1 ~ normal(0,1); 
  alfa2 ~ normal(0,1); 


  //Initialize State Equations
  potential[1] ~ normal(0,1);
  grow[1] ~ normal(0,1);
  gap[1] ~ normal(0,1);

  // State Equations
  for(t in 2:T) {
    grow[t] ~ normal(grow[t-1], sigma_grow);
    potential[t] ~ normal(potential[t-1] + grow[t], sigma_pot);
      gap[t] ~ normal( alfa1*gap[t-1], sigma_gap);
  }

  // Measurement Equations
  for(t in 1:T) {
   Y[t,1] = potential[t] + gap[t];
     Y[t,2] ~ normal(Y[t-1,2] + alfa1*gap[t],sigma_inf);
  }
}

Here I tried model

mvf_model <- stan(file = "newstan.stan" , 
                  data = list(T = nrow(data),
                              P = ncol(data),
                              Y = data),
                  chains = 4)

And here is the error I’ve got

Cannot assign to variable outside of declaration block; left-hand-side variable origin=data
Illegal statement beginning with non-void expression parsed as
  Y[[t, 1]]
Not a legal assignment, sampling, or function statement.  Note that
  * Assignment statements only allow variables (with optional indexes) on the left;
  * Sampling statements allow arbitrary value-denoting expressions on the left.
  * Functions used as statements must be declared to have void returns

 error in 'model16a4a72378d_newstan' at line 27, column 3
  -------------------------------------------------
    25: transformed parameters {
    26:  for(t in 1:T) {
    27:    Y[t,1] = potential[t] + gap[t];
          ^
    28:   }
  -------------------------------------------------

PARSER EXPECTED: "}"
Error in stanc(file = file, model_code = model_code, model_name = model_name,  : 
  failed to parse Stan model 'newstan' due to the above error.

Hi There! Welcome to the Stan forums.

The error you’re seeing is pretty clear; not every block allows you to assign values (i.e. using the equals sign ‘=’) to a variable, and (as far as I know) none of the blocks allow you to assign values to a variable that was declared in a different block.

There is one discrepancy between the error which refers to the transformed parameters block and the Stan code you posted, which doesn’t contain a transformed parameters block. I assume the error is from a slightly different version of the model, is that correct? Furthermore, some of your code got cut off so I assume the first block is the data block. if this is the case, it seems Y is a data variable, so all the elements of Y should be filled in from a data source (for instance, in R) and none of it should be assigned in Stan. In this case it seems Y should not actually contain data, so you can fix this problem by declaring it in the transformed parameters block and then assigning to it. I would personally separate gdp and inflation, instead of those being different columns of the matrix Y, because you want gdp to exactly be the sum of two other variables (which can be done in transformed parameters) and you want inflation to contain measurement error, so it should be in the model block.

transformed parameters {
  vector[T] gdp;
  gdp = potential + gap;
}

model {
  ... // mostly what you already had
  inflation[1] ~ ... // prior on what you expect inflation to be in the first period
  for (t in 2:T) 
    inflation[t] ~ normal(inflation[t - 1], ...);
}

Hope this helps!

2 Likes

Hi MauritsM!

Thank you a lot!
It’s unbelievable kind of you to spend your time and help me so much!
May I ask you to check if I correctly understand your ideas?

Here is my data (this time I wrote it more clearly)

a1 <- 0.7
a2 <- 0.3

grow <- c()
potential <- c()
gap <- c()
gdp <- c()
inflation <- c()

grow[1] <- 2
potential[1] <- 10
gap[1] <- 0
gdp[1] <- potential[1] + gap[1]/100
inflation[1] <- 2

for (i in 2:100) {
  grow[i] <- grow[i-1] + rnorm(1, 0, 0.1)
  potential[i] <- potential[i-1] + grow[i]/100 + rnorm(1, 0, 0.1)
  gap[i] <- a1*gap[i-1] +  rnorm(1,0,0.1)
  gdp[i] <- potential[i] + gap[i]/100
  inflation[i] <- inflation[i-1] + a2*gap[i]   + rnorm(1,0,0.1)
}

data <- cbind(gdp, inflation)

mvf_model <- stan(file = "newstan.stan" , 
                  data = list(T = nrow(data),
                              gdp = data[,1],
                              inflation = data[,3]),
                  chains = 4)

And here is my rstan code:

data {
  int T; // number of obs
  vector[T] inflation;//dataset of generated series
}
 
parameters {
  
  #Coefficients
  vector[1] alfa1; //ar gap
  vector[1] alfa2; //phillips curve

  #State Variables (unobserved economic variables)
  vector[T] gap; // output gap
  vector[T] potential; // potential output
  vector[T] grow; // growth of potential output

  #Innovations
  real<lower = 0> sigma_pot; // The scale of innovations to potential output
  real<lower = 0> sigma_grow; // The scale of innovations to growth in potential output
  real<lower = 0> sigma_gap; // The scale of innovations to output gap
  real<lower = 0> sigma_inf; // The scale of innovations to phillips curve
}

transformed parameters {
  vector[T] gdp;
  gdp = potential + gap;
}

model {
  // priors
  
  //Innovations
  sigma_pot ~ cauchy(0.2,3);
  sigma_grow  ~ cauchy(0.3,3); 
  sigma_gap ~ cauchy(0.9,5); 
  sigma_inf ~ cauchy(2,5);
  
  //coefficients
  alfa1 ~ normal(0,1); 
  alfa2 ~ normal(0,1); 


  //Initialize State Equations
  potential[1] ~ normal(0,1);
  grow[1] ~ normal(0,1);
  gap[1] ~ normal(0,1);

  // State Equations
  for(t in 2:T) {
    grow[t] ~ normal(grow[t-1], sigma_grow);
    potential[t] ~ normal(potential[t-1] + grow[t], sigma_pot);
	  gap[t] ~ normal( alfa1*gap[t-1], sigma_gap);
  }

  // Measurement Equations
  for(t in 2:T) {
	 inflation[t] ~ normal(inflation[t-1] + alfa1*gap[t],sigma_inf);
  }
}
1 Like

Hi @D11, no worries. I’ll have a quick look and post some comments below.

First of all, the items in the data block should correspond to the items you pass to the stan function in R via data = list(...). I see you have gdp in there, but your Stan code doesn’t have this in the data block. If gdp is an observed variable instead of a parameter, then you need to modify the model to take this into account.

I don’t know whether you should declare alpha as vector[1] or real (whether that makes any difference performance-wise).

I’m not sure about those priors on your sigma_* parameters. I’m not an expert here so I’ll gladly defer if someone else has an opinion. IIRC you’d probably want those to be centered at zero and perhaps not use a Cauchy distribution but something with thinner tails - as this seems to improve sampling. What kind of values would you find plausible for these parameters?

Other than that the model looks very reasonable. Please let me know whether this model gives plausible outcomes!