Hello both,
Thanks for your comments! For clarification, the GP is used as a surrogate model to my computationally expensive building physics code. I’m trying to calibrate the some of the model inputs of a model used to predict energy use in a building. Drawing from Higdon et al. (2004):
At various settings for x, observations y are made of the physical system which are modelled statistically using the simulator \eta(x, \theta) at the true calibration value \theta according to:
y(x_i) = \eta(x_i,\theta) + \delta(x_i) + \epsilon(x_i)
where the stochastic term (modelled as GP) \delta(x_i) accounts for discrepancy between the simulator \eta(x_i,\theta) and reality \zeta(x_i), and \theta denotes the “true,” but unknown, setting for the calibration inputs t. For the following explanation, I’m ignoring the \delta(x_i) term for simplicity.
Quite often the computational demands of the simulator make it impossible to use an estimation approach based on MCMC directly on the simulator due to the number of simulations it would take to converge. Instead, a limited number of simulation runs may be used:
\eta(x^*_j,t^*_j), \, j=1,...,m.
-
Treat \eta(x,t) as unknown for pairs of (x,t) that aren’t included in m simulator runs
-
If x \in \mathbb{R}^p and t \in \mathbb{R}^l then \eta(x,t) maps \mathbb{R}^{p+l} to \mathbb{R}.
-
A standard prior model for an unknown function (\eta(x,t)) is a Gaussian Process (GP)
Assume that \mu(x,t) of GP is constant and specify a covariance function:
cov((x,t),(x',t')) = \frac{1}{\lambda_\eta} exp\{ -\sum^{p}_{k=1} \beta^\eta_k|x_{ik} - x'_{ik}|^{\alpha} -\sum^{l}_{k'=1} \beta^\eta_{p+k}|t_{ik} - t'_{ik}|^{\alpha} \}
If we assume that:
-
field observations: y = (y(x_1),...,y(x_n))^T
-
simulation outcomes: \eta = (\eta(x_1^*,t_1^*),...,\eta(x_m^*,t_m^*))^T
we can now define a joint n+m vector z = (y^T,\eta^T)^T and the likelihood is:
L(z|\theta,\mu,\lambda_\eta,\beta^\eta,\Sigma_y) \propto |\Sigma^{-1}_z|
exp\{-\frac{1}{2}(z-\mu\boldsymbol{1}_{n+m})^T \Sigma^{-1}_z (z-\mu\boldsymbol{1}_{n+m})\}
where \boldsymbol{1}_{n+m} is the n+m vector of 1s and
\Sigma_z = \Sigma_\eta + \begin{pmatrix} \Sigma_y & 0 \\ 0 & 0 \end{pmatrix}
Conditioning on the augmented observaiton vector z results in the posterior:
\pi(\theta,\mu,\lambda_\eta,\beta^\eta|z) \propto L(z|\theta,\mu,\lambda_\eta,\beta^\eta,\Sigma_y)\pi(\theta)\pi(\mu)\pi(\lambda_\eta)\pi(\beta^\eta)
It is advised (by Higdon et al. (2004)) that,
-
The input points (x,t) are standardised to be contained in [0,1]^{p+l}
-
d is transformed so \eta has a mean of 0 and variance of 1
-
independent priors are chosen for \mu, \lambda_{\eta} and \beta^{\eta}
I hope the short description of the background theory clarifies things. My question relates to the fact that since t are standardised to [0, 1] then \pi(\theta) will need to be on the same scale. Since the model represents a building, the inputs t have a a physical meaning. \theta_1 might relate to the boiler efficiency for example and \theta_2 might relate to thermostat setpoint. If I have reason to believe that the thermostat setpoint may be represented by a Normal(24,4.5), forming my prior, I’ll then need to re-scale this to be within [0, 1].
Would your advice be to re-scale this outside of .stan and then use the re-scaled prior or would you that within .stan? I realise that the standardisation is different the min-max scaling. I just thought that since both of them are linear transformations then the process wouldn’t be that different if it can be applied to the priors as well.
Many thanks,
Cali