Reducing dimensionality of stochastic local levels with explanatory variables and (too) many observations

I have a fairly simple situation that I’d like to fit with what I believe would be called a state space model. Basically I have a few million observations of trade prices for a commodity that is sold on a marketplace over the course of a month. The commodity has several quality traits that are measured as numerical values that influence the price (my explanatory variables). The overall supply and demand fluctuations cause the base price to drift around over time (stochastic local levels). I want to separate the overall price drift to better understand the static effects that different quality traits have on the price.

I think this is a pretty standard state model and can be easily modeled with a few hundred observations, but can’t handle millions. Because the stochastic drift is relatively slow moving, I believe reducing the dimensionality makes sense. My first thought was to only update the local level at fixed intervals, say 1 time per day.

My question is whether there is a more elegant or standard approach to what I’m sure must be a common problem with Bayesian inference?

Some estimation approaches for such models will easily handle millions of observations, I would guess it’s also within the realms of tractability for a full Bayesian approach, but might require careful specification. Maybe worth considering the cost v benefit of a non reduced max likelihood form versus a dimension reduced Bayesian form. My preference would be for the former.

1 Like