Implementing model cuts in Stan

blurpig_43515 · November 12, 2020, 10:17am

Hi everyone, i just came across this post! Not sure if it is appropriate to ask it here but ill ask anyway. I am currently a university student working on one of Pierre’s Jacob’s paper “Better together? Statistical learning in models made of modules” for my thesis. I am attempting to code the cut posterior of the biased data on Stan. So far i have this and it doesnt seem to work! (My prof die die wants me to use it on Stan)

library(StanHeaders) 
library(Rcpp) 
library(ggplot2) 
library(rstan) 
library(withr) 
options(mc.cores = parallel::detectCores()) 
rstan_options(auto_write = TRUE)
 N1 <-100 
N2 <-1000 
Y1 <- rnorm(N1, mean = 0, sd = 1) 
Y2 <- rnorm(N2, mean = 1, sd = 1) 
lambda1 <- 1 
lambda2 <- 100 
model_data1 <- list(N1=N1,Y1 = Y1,lambda1 = lambda1, lambda2 = lambda2) 
fit2 <-stan(file = 'model1.stan', data = model_data1)

//model1.stan 
data 
{ int N1; 
vector[N1] Y1; 
real lambda1; 
} 

parameters { 
real theta1; 
} 

model { 
for(i in 1:N1) 
Y1[i] ~ normal(theta1, 1); theta1 ~ normal(0,lambda1^-1); 
}

my_samples2 <- extract(fit2) 
theta1samples <- my_samples2$theta1[2001:2100] 
model_data4<-list(N1=N1,N2=N2,Y2 = Y2,theta1samples=theta1samples,lambda2 = lambda2) 
fit5 <- stan(file = 'cut1ci.stan', data = model_data4) 
my_samples4 <- extract(fit5)

//cut1ci.stan 
data { 
int N1; 
intN2; 
vector[N2] Y2; 
real lambda2; 
vector[N1] theta1samples; 
} 
parameters { 
real theta2; 
} 

model { 
for(i in 1:N1) 
Y2[i] ~ normal(theta1samples[i] + theta2, 1); 
theta2 ~ normal(0, sqrt(inv(lambda2))); 
}

Then, if both draws are correct, how do i put both draws together? Thank you for your help!

ADMIN EDIT: Moved to a new thread and added code formatting by @martinmodrak

martinmodrak · November 12, 2020, 10:43am

I think we will need a bit improved description of your goals and issues to help you effectively. I admit I am not familiar with the paper you refer to (and I didn’t attempt to find it and read it). Based on the original topic you posted to, I have a vague understanding you somehow want to combine multiple separate models and use this as an approximation to a full model that would use all the data. But I don’t understand any of the details of your approach - could you clarify what would be the full model you are trying to approximate?

Also it appears you are inconsistently working precision/sd in the normal distribution (Stan uses sd as the second parameter) as once you compute inverse and once you compute square root of the inverse.

Best of luck with the thesis!

blurpig_43515 · November 12, 2020, 10:51am

Thank you martin for your reply! Sorry i am new and inexperienced with stan and dont really know where to get help or find questions!

I am trying to implement the cut algorithm using this sampler:

for s = 1, . . . , N1,
Sample θ1(s) ~ p(y1|θ1) p(θ1)
For s = 1,…,N1,
Sample θ2(s) ~ p(θ2|θ1(s) ,Y2)

Return (θ1(s) * θ2(s))

This is the cut algorithm:

These are the various likelihood distributions for θ1, θ2, Y1, Y2

Y1 = (Y1.1,Y1.2 . . . , Y1.n1).
Y1.i ∼ N (θ1, 1) for all 1 ≤ i ≤ n1.
A prior distribution for θ1 is N (0, λ1^-1).
Y1 = (Y2.1,Y2.2 . . . , Y2.n2). Y2.i~ N (θ1 + θ2, 1) for all 1 ≤ i ≤ n2.
The prior distribution on θ2 is N (0, λ2^-1).
The value of θ1* is set as 0, the value of θ2* is set as 1.
n1 is set as 100 and n2 = 1000,
Furthermore, we use λ1 = 1 and λ2 = 100.

mike-lawrence · November 12, 2020, 12:07pm

This is the same model you posted about earlier, right? You really have to take my advice there and code it in a single model. Second, you’ve made some progress in coding things, but you you need to put the priors outside the for loops (and you actually don’t need the loops at all, Stan will see that the Ys are vectors and loop over their elements implicitly.

blurpig_43515 · November 12, 2020, 12:47pm

Yes sorry!! I have watched your lectures and sorry I don’t see the link between the two-group model and my model. I saw a post with regards to cut posterior post and mentioned the writer of the paper that i was doing, so I thought that it was also good to reply the post there as well, but my post was moved to a new post.

Alright okay, ill try to code it in a single model, thank you!

mike-lawrence · November 12, 2020, 1:16pm

So I skimmed the paper you referred to, as well as one of the authors’ blog posts about it, and I’ll say first that it will not be possible to implement their scheme in Stan in any kind of straightforward way.

Second, the entire idea of their approach is really only applicable for cases where you have un-explorable/un-mitigatable model misspecification, which is a scenario that I think rarely (if ever) holds. I may be failing to consider the range of possible real-world scenarios, but if you know that one aspect of your model is more likely than others to be misspecified, it very sketchy to me to simply attempt to limit/bias the means by which information from that part of the model affects other parts of the model and push on with inference from the latter. Best practice that seems to be the growing culture in the Stan-iverse at least is to explore misspecification when you encounter it and add structure as necessary.

mike-lawrence · November 12, 2020, 1:23pm

Oh, and now I see that my characterization of the Stan culture was overly homogeneous. Discussion here from some folks supporting the general idea (if not specific implementation).

blurpig_43515 · November 12, 2020, 1:52pm

Hi Mike, thank you for your reply!

Yes i have been reading comments and feedback that this cut posterior is very difficult/near impossible to implement on Stan, and it is much easier to use BUGS/JAGS, however my supervising professor has been insisting for me to use Stan. I have sent him my Stan code but he mentioned that he isn’t very proficient in Stan too hence i have to hunt online for help, and this seems to be the only place where I can ask for help.

My prof suggested this nested MCMC algorithm from this paper : https://arxiv.org/pdf/2003.06804.pdf

Thats why i wrote the sampler that i mentioned above:

for s = 1, . . . , N1,
Sample θ1(s) ~ p(y1|θ1) p(θ1)
For s = 1,…,N1,
Sample θ2(s) ~ p(θ2|θ1(s) ,Y2)

And my prof also mentioned treating θ1(s) as data that’s why I came up with this 2 model thingy.

But anyways thanks so much for your help/input and for your suggestions i really appreciate it! Sorry if i am annoying or inconveniencing anyone with my posts and questions!

mike-lawrence · November 12, 2020, 1:59pm

Ok, well, first you can only ever express a single model in Stan. You can never express a sampler. Stan takes care of the “combining a model with data” part using it’s own sampler.

Second, in the various places you’ve expressed your model, you have sometimes implied that your thetas are single-value reals, and at other times implied that your thetas are vectors of reals with a length that is the same length as the data. Can you clarify which you intend to implement? I’ll say that it is unlikely that it makes any sense to model the thetas as being vectors, at least in the model/data scenarios you’ve described so far.

blurpig_43515 · November 12, 2020, 2:58pm

I see, alright! I will try to implement everything in a single model!

Sorry for the confusion regarding the theta, okay i will use real theta1 and real theta2!

blurpig_43515 · November 14, 2020, 2:06pm

Hi Mike, this is what i can think of so far but still doesn’t seem accurate, am i in the right direction or need to try a completely different approach?

data {
int<lower=0>N1;
int<lower=0>N2;
vector[N1] Y1;
vector[N2] Y2;
real<lower=0> lambda1;
real<lower=0> lambda2;
}

parameters {
real theta1;
real theta2;
}

transformed parameters{
vector[N1] theta1samples;
theta1samples = theta1*Y1;
}

model {
theta1 ~ normal(0, sqrt(inv(lambda1)));
Y1 ~ normal(theta1,1);
theta2 ~ normal(0, sqrt(inv(lambda2)));
for(i in 1:N1)
Y2[i] ~ normal(theta1samples[i] + theta2, 1);

}

mike-lawrence · November 14, 2020, 2:13pm

What’s your motivation for the “theta1samples” variable? Remember, you don’t need to do anything special in a model to express sampling of a parameter, Stan will do that for you by sampling all parameters at once.

blurpig_43515 · November 14, 2020, 2:20pm

Thank you for your reply, Mike!

My intention of using the theta1samples is to get the draws from θ1|Y1 and use it for the θ2 samples

mike-lawrence · November 14, 2020, 2:24pm

Yes, that’s what I thought. Stan handles “getting draws” for parameters for you; you simply have to express what the model does with a given set of parameter values. In your case you want to express that Y2 ~ normal( theta1 + theta2 , 1)

blurpig_43515 · November 14, 2020, 2:45pm

Ah okay, so if

Y2 ~ normal( theta1 + theta2 , 1)

that means i have to express theta1 in the model in such a way that theta1 = θ1|Y1?

but models have to be in a distribution form and i cant put an = sign on it.

mike-lawrence · November 14, 2020, 2:50pm

Nope, you don’t need to do anything more with theta1 than you’ve already done. You have already expressed that theta1 has a prior, you have expressed that Y1 has a likelihood conditional on theta1, you have a prior for theta2, and the Y2 ~ normal(theta1+theta2,1) expresses a liklihood for Y2 that is contingent on both theta1 and theta2. That’s all you need.

blurpig_43515 · November 14, 2020, 2:55pm

Hi mike, Is this what you mean?

model {
theta1 ~ normal(0, sqrt(inv(lambda1)));
Y1 ~ normal(theta1,1);
theta2 ~ normal(0, sqrt(inv(lambda2)));
Y2~ normal(theta1+theta2,1);
}

When i do this though, i get the full posterior p(theta1 | Y1, Y2) p(theta2|Y2,theta1) rather than the cut posterior p(theta1 | Y1) p(theta2|Y2,theta1) that i want to calculate.

mike-lawrence · November 14, 2020, 3:04pm

Ah, sorry, I could have been clearer. When you said “my supervising professor has been insisting for me to use Stan”, I thought you meant that you had accepted that Stan can’t do the cut prior and wanted to just do a standard model, so I was giving input on how to achieve that.

@Bob_Carpenter has anyone actually achieved an implementation of the model cutting thing? Is there an example we can point folks to?

blurpig_43515 · November 14, 2020, 3:08pm

Ahh okay no worries, thank you so much for your help and input either way, i appreciate it!

blurpig_43515 · November 15, 2020, 9:52am

Hi Mike (and everyone else), I think i got the solution! Looks very similiar to the graph from the main paper so i thought that i will share my answer with everyone and can be used as reference in the future!

library(StanHeaders)
library(Rcpp)
library(ggplot2)
library(rstan)
library(withr)
options(mc.cores = parallel::detectCores())
rstan_options(auto_write = TRUE)

N1 <-100
N2 <-1000
Y1 <- rnorm(N1, mean = 0, sd = 1)
Y2 <- rnorm(N2, mean = 1, sd = 1)
lambda1 <- 1
lambda2 <- 100
model_data1 <- list(N1=N1,Y1 = Y1,lambda1 = lambda1, lambda2 = lambda2)

fit2 <-stan(file = 'model1.stan', data = model_data1)
my_samples2 <- extract(fit2)

//model1.stan 
data 
{ int N1; 
vector[N1] Y1; 
real lambda1; 
} 

parameters { 
real theta1; 
} 

model { 
for(i in 1:N1) 
Y1[i] ~ normal(theta1, 1); theta1 ~ normal(0,lambda1^-1); 
}

This is to extract θ1 samples from θ1|Y1

theta1samples <- my_samples2$theta1[2001:2100] //burn in first 2000, extract N1 samples
model_data4<-list(N1=N1,N2=N2,Y1=Y1,Y2 = Y2,theta1samples=theta1samples,lambda2 = lambda2)
fit5 <- stan(file = 'cut1.stan', data = model_data4)
cut_samples <- extract(fit5)

//cut1.stan
data {
  int<lower=0>N1;
  int<lower=0>N2;
  vector[N1] Y1;
  vector[N2] Y2;
  real<lower=0> lambda2;
  vector[N1] theta1samples;
}

parameters {
    real theta2;
}

model {
  for(i in 1:N1)
  Y1[i] ~ normal(theta1samples[i], 1);
  for(i in 1:N1)
  Y2[i] ~ normal(theta1samples[i] + theta2, 1);
  
}

The cut_samples drawn is the final solution.

EDIT: @maxbiostat has edited this post for syntax highlighting.

Topic		Replies	Views
Posteriors too wide but no obvious problems - fitting a latent normal-multinomial model Modeling fitting-issues	2	492	June 9, 2018
Moon-shaped Posterior Modeling	2	358	September 7, 2020
How to realize the R code in Stan Modeling rstan	3	756	July 31, 2021
Mixed Posterior Sampling with Known and Unknown Conditional Posteriors Modeling rstan	1	50	October 29, 2024
Help on the initialization error of my stan program Modeling rstan , fitting-issues	2	468	October 30, 2022

Implementing model cuts in Stan

Related topics