How to do constrained optimization of the joint posterior

Hi I am a Stan newbie and I’ve been using the pyStan api.

I am trying to model the energy consumption in a house as a function of several components. Here is my model.

data {
    int<lower=0> N; // number of observations
    int<lower=0> H; // number of hours
    vector[N] x_hot_weather; // weather-dependent variable
    vector[N] x_cold_weather; // weather-dependent variable
    matrix[N, H] x_activity; // matrix indicator functions where each col is an hour of the day
    vector[N] y;

    int<lower=0> N_test;
    vector[N_test] x_hot_weather_test; // weather-dependent variable
    vector[N_test] x_cold_weather_test; // weather-dependent variable
    matrix[N_test, H] x_activity_test;
parameters {
    real<lower=0> baseload; // baseload consumption
    real<lower=0> beta_hot_weather; // coefficient for weather
    real<lower=0> beta_cold_weather; // coefficient for weather
    vector<lower=0>[H] beta_hod; // coefficients for hour of day
    real<lower=0,upper=100> sigma;

transformed parameters {
    vector[N] theta;
    theta = baseload + x_cold_weather * beta_cold_weather + x_hot_weather *
    beta_hot_weather + x_activity * beta_hod;
model {
    baseload ~ cauchy(0, 5);
    sigma ~ cauchy(0, 5);
    beta_cold_weather ~ student_t(5, 0, 2.5);
    beta_hot_weather ~ student_t(5, 0, 2.5);

    for (h in 1:H)
        beta_hod[h] ~ student_t(5, 0, 2.5);

    y ~ normal(theta, sigma); // likelihood

generated quantities {
  vector[N_test] y_test; // predictions
  real theta_test;

  vector[N] y_train; // predictions
  real theta_train;

  for (n in 1:N_test) {
    theta_test = baseload + x_cold_weather_test[n] * beta_cold_weather +
        x_hot_weather_test[n] * beta_hot_weather + x_activity_test[n] * beta_hod;
    y_test[n] = normal_rng(theta_test, sigma);

  for (n in 1:N) {
    theta_train = baseload + x_cold_weather[n] * beta_cold_weather +
        x_hot_weather[n] * beta_hot_weather + x_activity[n] * beta_hod;
    y_train[n] = normal_rng(theta_train, sigma);

The issue is during prediction, I actually don’t want to predict Y (although I am generating it up top). I actually know Y and I want to use it to formulate a constrained optimization where I maximize the joint posterior of all my params constrained on the equality:

baseload + x_{cold\_weather} * beta_{cold\_weather} + x_{hot\_weather} * beta_{hot\_weather} + x_{activity\_test}* beta_{hod} = Y

as well as a few other inequalities. Is it possible to do that in Stan?
thank you.

Hi, when you say predict y, do you mean the generated quantities block?

yes I mean y_{test}.
So I would learn the joint posteriors using y_{train} and the X’s. Then during prediction, I actually want to maximize the joint posterior given y_{test} instead of trying to predict y_{test}. Eventually , after the maximization, I would like to see the marginalized posteriors for each param and for each y_{test}.

I don’t quite get what you want to do.

So you do sampling -> 2000 draws -> for each draw i you want to optimize params i different timed based on y_test?

You know, I thought about it some more and realised that what I’m saying is: after I fit with y_{train}, i want to fit with $y_{test}.

Is it possible to fit incrementally. Sorry for my confusing question. Hopefully this is clearer.

There are some ways to do it, but it is not easy. Usually Stan models are fit in a one go.

Is there a reason you don’t include test data to train data?

But, I will let someone with more knowledge about incremental fitting to answer.

cc @martinmodrak do you know someone who could answer this

1 Like

I should not have called it y_{test} to begin with. The reason I don’t include it in the original training data is because I don’t have the data. The data is being collected incrementally.

Those questions on “incremental fit” arise from time to time. The short answer as far as I understand this topic is: there are many ways to approximate this, none of them particularly great. It is an open research problem.

I recently responded to a somewhat similar question: Estimating Subject Level Effects for New Subjects in IRT, or multilevel models in general Would that be a similar problem to what you are facing?

Best of luck with your model