This is partly a Stan question and partly a general modeling question.
I’m working with data on time-of-use wholesale electricity prices as well as hedge prices. For now I’m looking at month-average prices. “Time-of-use” prices, also called “spot prices”, are what you pay if you buy as you go. An alternative is to buy a “hedge”: you can buy a set number of kilowatt-hours a day in advance, or a month in advance, or a year in advance, or whatever, at a price that is set by a trading market.
Month to month the mean spot price goes up and down. There’s a seasonal pattern but also a lot of bumpiness. There can be an expensive month or an expensive summer, or a cheap spring, or whatever. Of course there are also long-term trends and short-term trends and so on.
Let me take Texas as an example, and just consider hedges bought 6 months ahead. If you wanted to buy electricity six months in advance, for electricity to be consumed in summer 2016, you ended up paying way more than if you had bought on the spot market (by about a factor of two). Same for 2017. And in 2018 you paid waaaay more: the price for a six-month-ahead hedge was $175 per MWh but the mean spot price for the month ended up being only about $40. So, OK, why would anyone ever buy electricity six months ahead, if it’s always so much more expensive? Well, summer 2018 there was a huge spike in electricity prices, which went up to $200, far higher than had ever been seen before. The 6-month-ahead price for summer 2018 was the same as summer 2017, about $175, so if you had bought the hedge you came out ahead. Basically: “the market” realized that a hot summer with less-than-normal wind in Texas could lead to a big jump in electricity prices, and that was priced into the hedge price every summer…and then one summer it actually happened. You could say the market saw it coming.
There are also events that ‘the market’ doesn’t see coming. Electricity prices in winter in Texas were always quite low, about $20 per unit, and “the market” didn’t think that was likely to change…and then there was a huge ice storm in February 2021 and electricity price went all the way to the regulatory limit of $6000 for a short time, and was high for two weeks. The monthly average ended up being almost $2000, something like 90x normal. (My client ran through their entire year’s energy budget in two weeks, in spite of taking energy-saving steps.)
I am working on a statistical model of prices. I’ve considered various long-tailed models and have settled, for now, on a mixture model. Let me start by describing it without any exogenous variables. It would just be a time series model: in a ‘normal’ month (a ‘component 1’ month) the expected price is predicted with a time series model that has monthly effects, a trend, some ‘noise’; the monthly effects can change from year to year but have expected mean of zero; etc. Very standard. But in a ‘component 2’ month the price have some big additional component added on, where that component is drawn from a wide distribution with a high mean value. ‘Component 2’ months are rare, with any month having something like a 5% chance of being a comp 2 month (or maybe 1%, or maybe 8%). If you only have a few years of data, you might not have any component 2 months.
I know how to fit the model above in Stan, or at least I think I do. But not because I have a deep understanding of how to do it in Stan, I simply took the code at the bottom of the mixture model page in the Stan manual and added the terms that I need, e.g. if I just want to include a month effect I have a line
lps[k] += normal_lpdf(y[n] | mu[k] + month_effect[n], sigma[k]);
where the original model didn’t have the month_effect term; and of course I set priors on on the month effects.
So far so good, but: I want to include the six-month-ahead hedge price as a predictive variable for the spot price. After all, this is what ‘the market’ has concluded is a fair price for buying electricity in advance. Perhaps it can be thought of as a prediction of the arithmetic mean of the spot price distribution in six months (plus a premium of unknown size). It includes the knowledge that in some months the distribution is much wider than in other months, i.e. that the month might turn out to be a ‘component 2’ month.
So I don’t want to say something like
lps[k] += normal_lpdf(y[n] | mu[k] + month_effect[n] + alpha*hedge_price[n], sigma[k]);
or at least I don’t think I do, because hedge_price[n] is neither a forecast of the ‘component 1’ price nor the ‘component 2’ price, instead it is something closer to the weighted arithmetic mean of the two.
Hmm. OK, on the one hand I’m realizing that I can probably make more progress myself before coming here for help. On the other hand I’m realizing the depth of my confusion about what it means to “sum out the responsibility parameter.”
The hedge price for a given month reflects both the market’s belief in what the price will be if the month is a normal (component 1) month, and also the probability that the month will be a crazy (component 2) month. The probability changes from month to month. When a month has a really high hedge price, that’s almost certainly due to the market thinking that the probability is relatively high that it will be a crazy month.
Let’s imagine that there are only two kinds of months: (a) months with a low six-month-ahead hedge price, which also turn out, six months later, to have a low spot price; and (b) months with a high six-month ahead hedge price, which may turn out to have a low mean spot price or a high mean spot price.
I want to fit a model to historical data that has those characteristics, and then use it to make a forecast for a month six months from now, for which I have the six-month-ahead hedge price. If the hedge price is low then I can use the relationship between spot price and hedge price from just the months described in (a), and if the hedge price is high then…hmm, I’m not sure.
OK, I don’t have a question here, I guess I’m not ready for this forum. Posting this anyway just to establish the thread. I will be back with more. I guarantee that whatever I come up with as a model, I will need help with coding it in Stan. It is confusing to me to not be able to use a latent variable for whether a past or future month is a component 1 month.