All,
I’m working with data on electric load – electricity consumption – by month, in a huge refrigerated warehouse. (I will be using this model for data from many warehouses). I want to fit a model to historical data and use it to make forecasts for the future. Someday I may have useful explanatory variables too but for now it’s just a time series model. I started by just using a SARIMA model but was unhappy with it various reasons, so I decided to use Stan and write a simple model that captures some real-world phenomena.
I currently have a Stan state space model that I’m fairly happy with. The reason I’m not entirely happy is that I want to apply constraints on both the lower and upper predicted value. These could be hard constraints – the load can’t possibly go below zero (in warehouses that don’t have solar generation), and can’t go above whatever load they would attain if they ran all their equipment on full blast all day and night. (Actually I’d rather have constraints that are firm but not sharp like that: there’s a very broad range where the value could be reasonable, so I don’t want to always pull things towards some central value, but once you start to approach one of the edges I’d like to start pushing back. Perhaps there’s some sort of flat-topped, curvy-edged plateau function I could apply, I dunno. But for now hard constraints would be fine).
The key features of the model are: (1) Some months have higher or lower load than other months, year after year. In some month they use “blast freezers” hour after hour because that’s when the strawberry harvest happens; then there are a couple months with lower consumption, and then the squid arrive in the area and suddenly they’re freezing squid hour after hour. (2) There are also multi-month excursions upwards or downwards: something happens in the economy so people are consuming more (or fewer) frozen meals, so the warehouse does more (or less) business, has the loading doors open more or less, etc. (3) Everything is constantly changing with time. In general terms – which months are the peaks and valleys – this year will probably look a lot like last year, but the details are different every year.
Before calling the function I standardize the data (subtract the mean, divide by the standard deviation).
As you can see in the code, the model has a ‘month effect’ and an effect that is local in time, and the prediction is the sum of these: ypred[t] = m[t] + v[t]. I assume m[t] is drawn from a distribution whose mean is the weighted mean of the month effects from one year ago and two years ago, with the weighting determined empirically (the ‘alpha’ and ‘beta’ parameters).
As I mentioned above, in this model I’m working with standardized data, so a hard constraint at 0 would turn into y_min = -(mean(y))/std(y), where y is the vector of historical data.
Here’s some R code to generate test data and run the model, and below is the Stan model itself (which should go into a file called ‘ts_model_scaled.stan’. I’d be happy to take advice on improving my model in any way at all, but the reason I’m here is to learn how I can apply constraints to ypred. Thanks very much for any help you can offer!
n_test = 36
y_test = 1000 + 500*cos((1:n_test)*pi/6) + rnorm(n = n_test, mean = 0, sd = 200)
mn = mean(y_test)
std = sd(y_test)
y_scaled = (y_test - mn)/std
#y_scaled_fake = rep(y_scaled[1:12], 5)
#y_scaled = y_scaled_fake + rnorm(n=length(y_scaled_fake), mean = 0, sd = 0.2) + (1:length(y_scaled_fake)) * 0.01
seasonality = 12
nT = length(y_scaled)
nP = nT + 2*seasonality # our choice: how far into the future do we want to forecast?
change_scale_local = sd(diff(y_scaled,1)) + 0.1
change_scale_seasonal = sd(diff(y_scaled,12)) + 0.1
input_list = list(s = seasonality, change_scale_local = change_scale_local,
change_scale_seasonal = change_scale_seasonal,
nT = nT, nP = nP,
component_max = 2,
y = y_scaled )
fit <- stan(file = 'ts_model_scaled.stan', data = input_list)
summary(fit)
vals = extract(fit)
ypred_mean = apply(vals$ypred, 2, mean)
ypred_min = apply(vals$ypred,2, min)
plot(vals$ypred[1,]*std + mn, type ='l', xlab = 'month', ylab = 'kWh', ylim = c(0,4500))
for (j in 2:200) {
lines(vals$ypred[j,]*std + mn)
}
lines(ypred_mean*std + mn, col = 'red', lwd = 2)
lines(y_scaled*std + mn, col = 'blue', lwd = 4)
Stan code follows:
// State state model:
// ypred = m[t-s] + v[t]
// (In my application: month effect + 'local' time effect)
//
data{
int s; // seasonality
int<lower=0> nT; // number of load data points
int<lower=0> nP; // number of points to predict & forecast, total.
real<lower = 0> change_scale_seasonal; // expected scale of seasonal changes
real<lower = 0> change_scale_local; // expected scale of local changes
real component_max;
real y[nT]; // load data
}
parameters {
vector<lower = -component_max, upper = component_max> [nP] m; // month effect
vector<lower = -component_max, upper = component_max> [nP] v; // local effect
real<lower = -1, upper = 1> alpha_local; // autoregressive multiplier for local changes.
real<lower = -1, upper = 1> beta_local; // autoregressive multiplier for local changes.
real<lower = -1, upper = 1> alpha_seas; // autoregressive multiplier for local changes.
real<lower = -1, upper = 1> beta_seas; // autoregressive multiplier for local changes.
real<lower = 0> sigma_level; // std of changes in local terms
real<lower = 0> sigma_seas; // standard deviation of changes in seasonal terms
real<lower = 0> sigma; // standard deviation of errors
}
transformed parameters{
vector [nP] ypred;
ypred = m + v;
}
model {
for (t in 3:nP)
v[t] ~ normal(alpha_local*(v[t-1] + beta_local*v[t-2]), sigma_level );
for (t in (2*s+1):nP)
m[t] ~ normal(alpha_seas*(m[t-s] + beta_seas*m[t-2*s]), sigma_seas );
for (t in 1:nT)
y[t] ~ normal(ypred[t], sigma);
alpha_local ~ normal(0, 0.5);
alpha_seas ~ normal(0, 0.5);
beta_local ~ normal(0,0.5);
beta_seas ~ normal(0,0.5);
sigma_level ~ normal(0,change_scale_local);
sigma_seas ~ normal(0,change_scale_seasonal);
sigma ~ normal(0, change_scale_local/12);
}