Implementing model cuts in Stan

Unfortunately I don’t think this is what you want. You’re first sampling theta1, then sampling theta2 using the theta1 samples as data. But you’ve arbitrarily grabbed only N1 of the theta1 samples and associated each with a single observation of Y2. You’ve also included a bit in the second model where you’re looping over Y1 again, but because everything in that loop is data, it’s not affecting sampling whatsoever, just making your model slower.

Your model 1 is correct, mostly. I’m worried that you might intend for the theta1 ~ normal(0,lambda1^-1); to be evaluated on every loop of your for loop, which isn’t what your code will actual do (I believe Stan will only include the first statement after a for loop declaration as included in the loop if you don’t provide explicit {}'s) and certainly isn’t what you should do. For clarity, here is a more explicit model1:

data { 
  int N1; 
  vector[N1] Y1; 
  real lambda1; 
} 
parameters { 
  real theta1; 
} 
model { 
  theta1 ~ normal(0,lambda1^-1); 
  for(i in 1:N1){
    Y1[i] ~ normal(theta1, 1); 
  }
}

And in fact since the normal() function is vector-compatible, you don’t even need the loop and can use this as model 1 instead:

data { 
  int N1; 
  vector[N1] Y1; 
  real lambda1; 
} 
parameters { 
  real theta1; 
} 
model { 
  theta1 ~ normal(0,lambda1^-1); 
  Y1 ~ normal(theta1, 1); 
}

Once that model is sampled, you should have as many samples as you asked for, say 1000. Now, for each one of these samples, you need to run model 2 as:

data {
  int<lower=0>N2;
  vector[N2] Y2;
  real<lower=0> lambda2;
  real single_theta1_sample;
}
parameters {
    real theta2;
}
model {
  theta2 ~ normal(0, lambda2^-1); //you forgot to put a prior on theta2, so I added it
  Y2 ~ normal(single_theta1_sample + theta2, 1);
}

To reiterate: you will have to run this second model 1000 times if you’ve obtained 1000 samples of theta from model 1. Each time you should use multiple chains and let the sampler do it’s normal warmup, then you can just grab a hundred or so samples. You’ll obviously need to automate this, in which case you’ll want to run the diagnostics to ensure good rhats and no divergences, after which you can throw away all but one sample of theta2, that way at the end you have as many samples of theta2 as you have for theta1.

At least, that’s my best understanding of this whole model-cutting business. I still don’t see where down-weighting credibility in the theta1 samples comes in.

2 Likes

Hi Mike, sorry for the late reply, was working on my finals so i took a break from thesis for a while!

Thank you for your feedback and for your suggestion! I will take a look at it.