Implementing model cuts in Stan

Hi everyone, i just came across this post! Not sure if it is appropriate to ask it here but ill ask anyway. I am currently a university student working on one of Pierre’s Jacob’s paper “Better together? Statistical learning in models made of modules” for my thesis. I am attempting to code the cut posterior of the biased data on Stan. So far i have this and it doesnt seem to work! (My prof die die wants me to use it on Stan)

library(StanHeaders) 
library(Rcpp) 
library(ggplot2) 
library(rstan) 
library(withr) 
options(mc.cores = parallel::detectCores()) 
rstan_options(auto_write = TRUE)
 N1 <-100 
N2 <-1000 
Y1 <- rnorm(N1, mean = 0, sd = 1) 
Y2 <- rnorm(N2, mean = 1, sd = 1) 
lambda1 <- 1 
lambda2 <- 100 
model_data1 <- list(N1=N1,Y1 = Y1,lambda1 = lambda1, lambda2 = lambda2) 
fit2 <-stan(file = 'model1.stan', data = model_data1) 
//model1.stan 
data 
{ int N1; 
vector[N1] Y1; 
real lambda1; 
} 

parameters { 
real theta1; 
} 

model { 
for(i in 1:N1) 
Y1[i] ~ normal(theta1, 1); theta1 ~ normal(0,lambda1^-1); 
} 
my_samples2 <- extract(fit2) 
theta1samples <- my_samples2$theta1[2001:2100] 
model_data4<-list(N1=N1,N2=N2,Y2 = Y2,theta1samples=theta1samples,lambda2 = lambda2) 
fit5 <- stan(file = 'cut1ci.stan', data = model_data4) 
my_samples4 <- extract(fit5) 
//cut1ci.stan 
data { 
int N1; 
intN2; 
vector[N2] Y2; 
real lambda2; 
vector[N1] theta1samples; 
} 
parameters { 
real theta2; 
} 

model { 
for(i in 1:N1) 
Y2[i] ~ normal(theta1samples[i] + theta2, 1); 
theta2 ~ normal(0, sqrt(inv(lambda2))); 
} 

Then, if both draws are correct, how do i put both draws together? Thank you for your help!

ADMIN EDIT: Moved to a new thread and added code formatting by @martinmodrak

I think we will need a bit improved description of your goals and issues to help you effectively. I admit I am not familiar with the paper you refer to (and I didn’t attempt to find it and read it). Based on the original topic you posted to, I have a vague understanding you somehow want to combine multiple separate models and use this as an approximation to a full model that would use all the data. But I don’t understand any of the details of your approach - could you clarify what would be the full model you are trying to approximate?

Also it appears you are inconsistently working precision/sd in the normal distribution (Stan uses sd as the second parameter) as once you compute inverse and once you compute square root of the inverse.

Best of luck with the thesis!

Thank you martin for your reply! Sorry i am new and inexperienced with stan and dont really know where to get help or find questions!

I am trying to implement the cut algorithm using this sampler:

  1. for s = 1, . . . , N1,
    Sample θ1(s) ~ p(y1|θ1) p(θ1)
  2. For s = 1,…,N1,
    Sample θ2(s) ~ p(θ2|θ1(s) ,Y2)

Return (θ1(s) * θ2(s))

This is the cut algorithm:
image
These are the various likelihood distributions for θ1, θ2, Y1, Y2

Y1 = (Y1.1,Y1.2 . . . , Y1.n1).
Y1.i ∼ N (θ1, 1) for all 1 ≤ i ≤ n1.
A prior distribution for θ1 is N (0, λ1^-1).
Y1 = (Y2.1,Y2.2 . . . , Y2.n2). Y2.i~ N (θ1 + θ2, 1) for all 1 ≤ i ≤ n2.
The prior distribution on θ2 is N (0, λ2^-1).
The value of θ1* is set as 0, the value of θ2* is set as 1.
n1 is set as 100 and n2 = 1000,
Furthermore, we use λ1 = 1 and λ2 = 100.

This is the same model you posted about earlier, right? You really have to take my advice there and code it in a single model. Second, you’ve made some progress in coding things, but you you need to put the priors outside the for loops (and you actually don’t need the loops at all, Stan will see that the Ys are vectors and loop over their elements implicitly.

Yes sorry!! I have watched your lectures and sorry I don’t see the link between the two-group model and my model. I saw a post with regards to cut posterior post and mentioned the writer of the paper that i was doing, so I thought that it was also good to reply the post there as well, but my post was moved to a new post.

Alright okay, ill try to code it in a single model, thank you!

1 Like

So I skimmed the paper you referred to, as well as one of the authors’ blog posts about it, and I’ll say first that it will not be possible to implement their scheme in Stan in any kind of straightforward way.

Second, the entire idea of their approach is really only applicable for cases where you have un-explorable/un-mitigatable model misspecification, which is a scenario that I think rarely (if ever) holds. I may be failing to consider the range of possible real-world scenarios, but if you know that one aspect of your model is more likely than others to be misspecified, it very sketchy to me to simply attempt to limit/bias the means by which information from that part of the model affects other parts of the model and push on with inference from the latter. Best practice that seems to be the growing culture in the Stan-iverse at least is to explore misspecification when you encounter it and add structure as necessary.

Oh, and now I see that my characterization of the Stan culture was overly homogeneous. Discussion here from some folks supporting the general idea (if not specific implementation).

Hi Mike, thank you for your reply!

Yes i have been reading comments and feedback that this cut posterior is very difficult/near impossible to implement on Stan, and it is much easier to use BUGS/JAGS, however my supervising professor has been insisting for me to use Stan. I have sent him my Stan code but he mentioned that he isn’t very proficient in Stan too hence i have to hunt online for help, and this seems to be the only place where I can ask for help.

My prof suggested this nested MCMC algorithm from this paper : https://arxiv.org/pdf/2003.06804.pdf

Thats why i wrote the sampler that i mentioned above:

  1. for s = 1, . . . , N1,
    Sample θ1(s) ~ p(y1|θ1) p(θ1)
  2. For s = 1,…,N1,
    Sample θ2(s) ~ p(θ2|θ1(s) ,Y2)

And my prof also mentioned treating θ1(s) as data that’s why I came up with this 2 model thingy.

But anyways thanks so much for your help/input and for your suggestions i really appreciate it! Sorry if i am annoying or inconveniencing anyone with my posts and questions!

1 Like

Ok, well, first you can only ever express a single model in Stan. You can never express a sampler. Stan takes care of the “combining a model with data” part using it’s own sampler.

Second, in the various places you’ve expressed your model, you have sometimes implied that your thetas are single-value reals, and at other times implied that your thetas are vectors of reals with a length that is the same length as the data. Can you clarify which you intend to implement? I’ll say that it is unlikely that it makes any sense to model the thetas as being vectors, at least in the model/data scenarios you’ve described so far.

1 Like

I see, alright! I will try to implement everything in a single model!

Sorry for the confusion regarding the theta, okay i will use real theta1 and real theta2!

Hi Mike, this is what i can think of so far but still doesn’t seem accurate, am i in the right direction or need to try a completely different approach?

data {
int<lower=0>N1;
int<lower=0>N2;
vector[N1] Y1;
vector[N2] Y2;
real<lower=0> lambda1;
real<lower=0> lambda2;
}

parameters {
real theta1;
real theta2;
}

transformed parameters{
vector[N1] theta1samples;
theta1samples = theta1*Y1;
}

model {
theta1 ~ normal(0, sqrt(inv(lambda1)));
Y1 ~ normal(theta1,1);
theta2 ~ normal(0, sqrt(inv(lambda2)));
for(i in 1:N1)
Y2[i] ~ normal(theta1samples[i] + theta2, 1);

}

What’s your motivation for the “theta1samples” variable? Remember, you don’t need to do anything special in a model to express sampling of a parameter, Stan will do that for you by sampling all parameters at once.

Thank you for your reply, Mike!

My intention of using the theta1samples is to get the draws from θ1|Y1 and use it for the θ2 samples

Yes, that’s what I thought. Stan handles “getting draws” for parameters for you; you simply have to express what the model does with a given set of parameter values. In your case you want to express that Y2 ~ normal( theta1 + theta2 , 1)

Ah okay, so if

Y2 ~ normal( theta1 + theta2 , 1)

that means i have to express theta1 in the model in such a way that theta1 = θ1|Y1?

but models have to be in a distribution form and i cant put an = sign on it.

Nope, you don’t need to do anything more with theta1 than you’ve already done. You have already expressed that theta1 has a prior, you have expressed that Y1 has a likelihood conditional on theta1, you have a prior for theta2, and the Y2 ~ normal(theta1+theta2,1) expresses a liklihood for Y2 that is contingent on both theta1 and theta2. That’s all you need.

Hi mike, Is this what you mean?

model {
theta1 ~ normal(0, sqrt(inv(lambda1)));
Y1 ~ normal(theta1,1);
theta2 ~ normal(0, sqrt(inv(lambda2)));
Y2~ normal(theta1+theta2,1);
}

When i do this though, i get the full posterior p(theta1 | Y1, Y2) p(theta2|Y2,theta1) rather than the cut posterior p(theta1 | Y1) p(theta2|Y2,theta1) that i want to calculate.

Ah, sorry, I could have been clearer. When you said “my supervising professor has been insisting for me to use Stan”, I thought you meant that you had accepted that Stan can’t do the cut prior and wanted to just do a standard model, so I was giving input on how to achieve that.

@Bob_Carpenter has anyone actually achieved an implementation of the model cutting thing? Is there an example we can point folks to?

1 Like

Ahh okay no worries, thank you so much for your help and input either way, i appreciate it!

Hi Mike (and everyone else), I think i got the solution! Looks very similiar to the graph from the main paper so i thought that i will share my answer with everyone and can be used as reference in the future!

library(StanHeaders)
library(Rcpp)
library(ggplot2)
library(rstan)
library(withr)
options(mc.cores = parallel::detectCores())
rstan_options(auto_write = TRUE)

N1 <-100
N2 <-1000
Y1 <- rnorm(N1, mean = 0, sd = 1)
Y2 <- rnorm(N2, mean = 1, sd = 1)
lambda1 <- 1
lambda2 <- 100
model_data1 <- list(N1=N1,Y1 = Y1,lambda1 = lambda1, lambda2 = lambda2)

fit2 <-stan(file = 'model1.stan', data = model_data1)
my_samples2 <- extract(fit2)
//model1.stan 
data 
{ int N1; 
vector[N1] Y1; 
real lambda1; 
} 

parameters { 
real theta1; 
} 

model { 
for(i in 1:N1) 
Y1[i] ~ normal(theta1, 1); theta1 ~ normal(0,lambda1^-1); 
}

This is to extract θ1 samples from θ1|Y1

theta1samples <- my_samples2$theta1[2001:2100] //burn in first 2000, extract N1 samples
model_data4<-list(N1=N1,N2=N2,Y1=Y1,Y2 = Y2,theta1samples=theta1samples,lambda2 = lambda2)
fit5 <- stan(file = 'cut1.stan', data = model_data4)
cut_samples <- extract(fit5)
//cut1.stan
data {
  int<lower=0>N1;
  int<lower=0>N2;
  vector[N1] Y1;
  vector[N2] Y2;
  real<lower=0> lambda2;
  vector[N1] theta1samples;
}

parameters {
    real theta2;
}

model {
  for(i in 1:N1)
  Y1[i] ~ normal(theta1samples[i], 1);
  for(i in 1:N1)
  Y2[i] ~ normal(theta1samples[i] + theta2, 1);
  
}

The cut_samples drawn is the final solution.

EDIT: @maxbiostat has edited this post for syntax highlighting.

1 Like