# How to make latent variable model more robust?

Hi everyone,

I’m still working on my latent variable model. An example would be the following:

Y_t = \alpha + Stock_t \beta + \epsilon_t, where Stock_t is a matrix of S latent variables defined as Stock_t = \lambda Stock_{t-1} + (1 - \lambda) Flow_t with Stock_{t=0} = Flow_{t=0} and 0 \leq \lambda \leq 0.99 .

The variables we observe are Y and Flow. We actually observe five flow variables (i.e. we have to deal with five latent stock variables).

The problem is that the estimates for \lambda and \beta are pretty ok, if \lambda is small but pretty off resp. unreliable for more extreme values of \lambda. Its a problem, because literature identifies values around 0.9 or even higher.

I generate the data as follows:

N <- 200 #number of observations
S <- 5 #number of latent stock variables

#true values
intercept <- 2
beta <- rep(0.5, S) # all betas 0.5 (we deal with elasticities)
sigma <- 0.5
lambda <- rep(0.5, S) # all lambda e.g. 0.5

flow <- matrix(rnorm(N*S),N,S) %>%
data.matrix() #this generates the observed flow variables (i.e. channel spendings)

stock <- matrix(NA,N,S) # latent stock variable (i.e. stock_t = lambda * stock_t-1 + (1 - lambda) * flow_t with stock_t=0 = flow_t and 0<lambda<0.99)

time <- as.numeric()
for (s in 1:S) {
for (k in 1:N) {
time[k] <- k
if(k==1){ ## k=1 actually t=0
stock[k,s] = flow[k,s]
} else {
stock[k,s]<- lambda[s] * stock[(k-1),s] + (1-lambda[s]) * flow[k,s]
}
}
}

y <- intercept + stock %*% beta + rnorm(N, 0, sigma)


The Stan program I run is:

code <- "data {
int <lower=1> N; // number of obs
int <lower=1> S; // number of stock variables
matrix[N,S] flow;//observed flow variable
vector[N] y;//observed outcome
}

parameters {

//coefficients
real alpha; //intercept
vector [S] beta; //slopes
vector <lower=0, upper=0.99> [S] lambda; // carryover coefficient

//scales
real<lower = 0> sigma_y; // scale for outcome
}

model {
matrix[N, S] stock; // stock

// state equation
for(s in 1:S) {
stock[1,s] = flow[1,s];
for(t in 2:N) {
stock[t,s] = lambda[s] * stock[t-1, s] + (1 - lambda[s]) * flow[t,s];
}}

//priors

//coefficients
alpha ~ normal(0,5);
beta ~ normal(0,5);

//scale
sigma_y ~ cauchy(0, 2.5);

//likelihood
y ~ normal(alpha + stock * beta, sigma_y) ;
}"

data = list(N = N,
S = S,
flow = flow,
y = as.numeric(y))


Now I have run some experiments. I vary the true values for \lambda from 0.1 to 0.9, run each model for 100 random samples and calculate the mean (standard deviation) across posterior means.

Experiment 1: \bf \lambda=0.1

\beta_1: 0.52 (0.05)
\lambda_1: 0.13 (0.06)

\beta_2: 0.52 (0.06)
\lambda_2: 0.13 (0.06)

\beta_3: 0.52 (0.04)
\lambda_3: 0.13 (0.06)

\beta_4: 0.51 (0.05)
\lambda_4: 0.12 (0.04)

\beta_5: 0.51 (0.05)
\lambda_5: 0.13 (0.06)

Experiment 2: \bf \lambda=0.2

\beta_1: 0.52 (0.06)
\lambda_1: 0.21 (0.07)

\beta_2: 0.51 (0.05)
\lambda_2: 0.21 (0.06)

\beta_3: 0.51 (0.06)
\lambda_3: 0.21 (0.08)

\beta_4: 0.50 (0.06)
\lambda_4: 0.21 (0.07)

\beta_5: 0.51 (0.06)
\lambda_5: 0.21 (0.07)

Experiment 3: \bf \lambda=0.3

\beta_1: 0.51 (0.07)
\lambda_1: 0.29 (0.07)

\beta_2: 0.52 (0.07)
\lambda_2: 0.31 (0.08)

\beta_3: 0.52 (0.07)
\lambda_3: 0.32 (0.09)

\beta_4: 0.51 (0.08)
\lambda_4: 0.31 (0.08)

\beta_5: 0.51 (0.06)
\lambda_5: 0.30 (0.08)

Experiment 4: \bf \lambda=0.4

\beta_1: 0.51 (0.07)
\lambda_1: 0.39 (0.08)

\beta_2: 0.50 (0.07)
\lambda_2: 0.39 (0.08)

\beta_3: 0.51 (0.07)
\lambda_3: 0.40 (0.09)

\beta_4: 0.50 (0.07)
\lambda_4: 0.40 (0.09)

\beta_5: 0.51 (0.09)
\lambda_5: 0.40 (0.10)

Experiment 5: \bf \lambda=0.5

\beta_1: 0.51 (0.08)
\lambda_1: 0.50 (0.09)

\beta_2: 0.51 (0.10)
\lambda_2: 0.50 (0.09)

\beta_3: 0.53 (0.09)
\lambda_3: 0.49 (0.09)

\beta_4: 0.51 (0.10)
\lambda_4: 0.49 (0.09)

\beta_5: 0.51 (0.09)
\lambda_5: 0.48 (0.09)

Experiment 6: \bf \lambda=0.6

\beta_1: 0.52 (0.1)
\lambda_1: 0.58 (0.09)

\beta_2: 0.50 (0.10)
\lambda_2: 0.57 (0.09)

\beta_3: 0.49 (0.12)
\lambda_3: 0.57 (0.10)

\beta_4: 0.51 (0.11)
\lambda_4: 0.58 (0.10)

\beta_5: 0.50 (0.10)
\lambda_5: 0.58 (0.09)

Experiment 7: \bf \lambda=0.7

\beta_1: 0.52 (0.13)
\lambda_1: 0.68 (0.09)

\beta_2: 0.51 (0.15)
\lambda_2: 0.66 (0.11)

\beta_3: 0.51 (0.15)
\lambda_3: 0.68 (0.09)

\beta_4: 0.49 (0.13)
\lambda_4: 0.67 (0.09)

\beta_5: 0.51 (0.12)
\lambda_5: 0.67 (0.10)

Experiment 8: \bf \lambda=0.8

\beta_1: 0.52 (0.21)
\lambda_1: 0.74 (0.09)

\beta_2: 0.53 (0.20)
\lambda_2: 0.75 (0.09)

\beta_3: 0.52 (0.19)
\lambda_3: 0.73 (0.10)

\beta_4: 0.51 (0.15)
\lambda_4: 0.76 (0.09)

\beta_5: 0.56 (0.27)
\lambda_5: 0.76 (0.08)

Experiment 9: \bf \lambda=0.9

\beta_1: 0.49 (0.39)
\lambda_1: 0.78 (0.10)

\beta_2: 0.52 (0.28)
\lambda_2: 0.81 (0.09)

\beta_3: 0.52 (0.28)
\lambda_3: 0.80 (0.11)

\beta_4: 0.49 (0.25)
\lambda_4: 0.79 (0.09)

\beta_5: 0.61 (0.38)
\lambda_5: 0.81 (0.10)

It is obvious that with a \lambda greater than 0.5, both the bias in \lambda and the standard deviation of \beta increase significantly. On average \beta does not seem to deviate much from the true value of 0.5 (except \beta_5). In individual cases, however, there are extreme deviations.

I wonder if there is a way to make the model more robust in terms of values close to 1 for \lambda.

Hello Dirk,

Could it be that it is just an artifact of having a relatively small sample? If I use your code and run the model with n=200 and n=2000 I for instance get the following estimates for \lambda_4:

The above one is for n=200. In that case the distribution is also skewed in the uncertainty towards lower values. This seems to make sense as values above 1 are not allowed. The mean estimate will therefore also be lower. You could consider looking at the median estimates. In this case the mean is 0.83 and the median 0.87.

In the case of n=2000, you have enough information such that you get a relatively nice symmetric distribution and the mean and median are both 0.89.

I hope this helps,

Best,

Duco

Edit:
I used set.seed(12345) for both data generations in R and in both cases stan(model_code = code, data = data, seed = 123, = list(adapt_delta = .95), iter = 2000) to estimate the model.

2 Likes

Thanks for you comment!

Yes, I know that the sample size in my example is rather small. I also tried it with higher sample sizes like 2000 and this could solve the problem. Unfortunately n = 200 reflects the actual number of observations in our data set.

Yesterday I generated the data in a slightly different way. Instead of …

Stock_{t}=\lambda Stock_{t-1} + (1 - \lambda) Flow_{t}

I used…

Stock_{t}=\lambda Stock_{t-1} + Flow_{t}

…which is also a common way in my field for defining stock variables.

With this adjustment the model runs much much better for all possible values of \lambda. Of course, also in this case the sample size still matters especially for very small values for \lambda e.g. \lambda=0.1 but the overall performance is much better even with n=200. Again I’ve run the model for each value of \lambda with 100 random samples and n=200.

\bf \lambda=0.1

          beta1    lambda1      beta2    lambda2      beta3    lambda3      beta4    lambda4      beta5    lambda5
mean 0.50303370 0.12895039 0.48915177 0.11948388 0.51113275 0.11118615 0.49335895 0.12136425 0.50115022 0.11943011
sd   0.03896781 0.05583114 0.03864411 0.05576133 0.03663838 0.04714694 0.03472243 0.04874717 0.03476376 0.04758323


\bf \lambda=0.5

          beta1    lambda1      beta2    lambda2      beta3    lambda3      beta4    lambda4      beta5    lambda5
mean 0.49732094 0.49153595 0.49894097 0.47976305 0.50452362 0.48904205 0.49852175 0.49263712 0.49322074 0.49483923
sd   0.03928572 0.04755773 0.04083128 0.04873066 0.03861157 0.04933745 0.03631945 0.04566225 0.03488969 0.04680824


\bf \lambda=0.9

         beta1     lambda1      beta2     lambda2      beta3     lambda3      beta4     lambda4      beta5     lambda5
mean 0.4986428 0.899196297 0.50157238 0.897524636 0.49772396 0.900451455 0.50234088 0.898290030 0.50121665 0.898455250
sd   0.0261708 0.009416562 0.02469753 0.008819268 0.02525131 0.008912518 0.02517476 0.009956857 0.02746949 0.009555688


So the problems seems to be clearly related to the (1-\lambda) in my initial post. Its absolutely ok for our project to use Stock_{t}=\lambda Stock_{t-1} + Flow_{t} but of course I’m still interested in why my original model leads to the biases in the initial post.

Thanks! :)

Hi! I’m by no means an expert on these kinds of models, but for me it kind of makes sense that the second model runs more smoothly. Re-write the first model (assuming S=1)

\begin{align} Y_t &= \alpha + \beta \text{Stock}_t + \epsilon_t \\ \text{Stock}_t &= \lambda \text{Stock}_{t-1} + (1 - \lambda) \text{Flow}_t \\ \leftrightarrow Y_t &= \alpha + \beta (\lambda \text{Stock}_{t-1} + (1 - \lambda) \text{Flow}_t) + \epsilon_t \\ &= \alpha + \beta \lambda \text{Stock}_{t-1} + \beta (1 - \lambda) \text{Flow}_t + \epsilon_t \\ &= \alpha + \beta \lambda \text{Stock}_{t-1} + (\beta - \beta\lambda) \text{Flow}_t + \epsilon_t \\ &= \alpha + \beta \lambda \text{Stock}_{t-1} + \beta\text{Flow}_t - \beta\lambda\text{Flow}_t + \epsilon_t, \end{align}

where the effect of Stock and Flow are both governed by \lambda and \beta. Whereas in the second model

\begin{align} Y_t &= \alpha + \beta \text{Stock}_t + \epsilon_t \\ \text{Stock}_t &= \lambda \text{Stock}_{t-1} + \text{Flow}_t \\ \leftrightarrow Y_t &= \alpha + \beta (\lambda \text{Stock}_{t-1} + \text{Flow}_t) + \epsilon_t \\ &= \alpha + \beta \lambda \text{Stock}_{t-1} + \beta \text{Flow}_t + \epsilon_t \\ \end{align}

the effect of Stock and Flow are entangled just by \beta and not by \lambda. My best guess would be that the second model has a much nicer geometry.

Edit: Just a random thought… Could you define \text{Stock}_t = \lambda (\text{Stock}_{t-1} - \text{Flow}_t) + \text{Flow}_t? This would imply

Y_t = \alpha + \beta \lambda (\text{Stock}_{t-1} - \text{Flow}_t) + \beta\text{Flow}_t + \epsilon_t,

which would be the first model. But maybe this one works better?! Be warned. I haven’t tried this and did not really think through this. ;)

2 Likes