How to add weights and constrains to Multilevel model

stanito · September 28, 2017, 4:21pm

I have some messy data that I’m modeling. There is a continuous outcome, continuous predictor and a categorical variable. In my real model I will have other predictors but for this demonstration I only include 2 predictors and the weights because they capture my problem.

The goal of this analysis is to make inferences on the group variable and allow for visualization of the relationships.
My two questions are about weights and a constraint:

The outcome needs to be weighted. This can be handled in lm() or lme4() in R but how can weights be included in the stan multilevel code below ? If I had a weights vector, would I just change y[i] ~ normal(mu, sigma); to be y[i] ~ normal(mu, sigma/weights[i]);?
The outcome, y, is bounded by the x predictor. Y has the relationship to x: y >= -1x. Not accounting for this constraint causes erroneous implausible predictions. You can see this if you run the R code below. It shows when predictions are made using the lm() model because the constraint is not accounted for the predictions are outside the bound. See the 2nd r plot below. I haven’t yet found a way to model the bound in a non-bayesian way using lm() or lme4(). Is this a case for gradient descent regression with a custom loss function? Any ideas? Given that I haven’t figure out a non bayesian approach to handle the constraint, I’m here because I’m wondering if that constraint, y >= - 1x, be incorporated to the stan multilevel model code below somehow?

HERE IS R CODE that uses lm() just for comparison

    x =   rep(seq(-10,10,1),each=5) 
    y =  rep(0,length(x) )
    weights = sample( seq(1,1000,1) ,length(x))
    groups = rep( letters[1:5], times =length(x)/5  ) 

## the outcome y has the relationship y >= -x
for(i in 1:length(x)){
  y[i] = sample(  seq(-x[i],-x[i]+ ifelse(x[i]<0 , 
                                          sample(seq(1,10,.1),1),
                                          sample(seq(5,40,.1),1) ),1)
                                          
                      ,1)
} 
 
dat =data.frame(x = x, y = y, weights = weights, group = groups)
model = lm(dat = dat, y ~ x + group + group * x, weights = weights)
dat$FIT= predict(model, dat, allow.new.levels = TRUE)

library(ggplot2)

## you can see the fit of the model for each group
ggplot(data = dat, aes(x = x, y = y, color = group))+geom_point( aes(size = weights))+
geom_line(data = dat, aes(x = x, y = FIT, color = group))

#data for new predictions
newdata = data.frame( x = seq(-100,100,1) , 
                      group = rep("c", length( seq(-100,100,1) )),
                      bound = -seq(-100,100,1)
                      )

## when predicting new data the outcome y should have the  relationship y >= -x
## but the predicted values using lm() are below the bound. See how red line is below blue to the left of zero
new_preds  = predict(model, newdata, allow.new.levels = TRUE)
ggplot(data = newdata, aes(x = x, y = new_preds, color = group))+geom_line()  +
  geom_line(data = newdata, aes(x = x, y = bound, color = "green"))

STAN CODE WITH MULTILEVEL MODEL but weights and constraints are not built in
# below is a mulitlevel model using stan but can weights and the constrain be built in to avoid erroneous predictions

library(rstan)
head(dat)
# create a numeric vector to indicate the categorical groups
dat$GROUP_ID  = match(  dat$group, levels( dat$group   )  ) 
library(rstan)
standat <- list(
  N = nrow(dat),
  y = dat$y,
  x = dat$x,
  GROUP_ID = dat$GROUP_ID,
  nGROUPS = 5
)


stanmodelcode = '
data {
int<lower=1> N; 
int nGROUPS;
real y[N];                                 
real x[N];   
int<lower=1, upper=nGROUPS> GROUP_ID[N];
}


transformed data{

}

parameters {
real intercept;  
vector[nGROUPS] intercept_x2;
real beta;                   
real<lower=0> sigma;    
real<lower=0>  sigma_2;
}

transformed parameters {  // none needed
}

model {
real mu;

// priors
intercept~ normal(0,10);
intercept_x2 ~ normal(0,sigma_2);
beta ~ normal(0,10);
sigma ~ normal(0,10);
sigma_2 ~ normal(0,10);

// likelihood

for(i in 1:N){
mu = intercept + intercept_x2[ GROUP_ID[i] ] +  beta*x[i];
y[i] ~ normal(mu, sigma);
}

}
'

fit22 = stan(model_code=stanmodelcode, data=standat, iter=5000, warmup=500, chains = 1)
fit22

stijn · September 29, 2017, 7:48am

Your first plot shows that the data for x is between -10 and 10, your second plot goes from -100 to 100. It looks like the condition might hold for x in [-10, 10]. You’re probably asking too much of a linear model to extrapolate outside the range of your data.

The weights can be incorporated as

for(i in 1:N){
  mu = intercept + intercept_x2[ GROUP_ID[i] ] +  beta*x[i];
  target =+ normal_lpdf(y[i] | mu, sigma) * weights[i] ;
}

Topic		Replies	Views
Sampling weights in Stan General	3	2031	April 16, 2019
Hierarchical model with summations does not fit simulated data Modeling fitting-issues , performance	3	737	October 8, 2018
Subgroup analysis as multilevel model Modeling specification	1	880	September 19, 2017
Multilevel hierarchical linear regression model with nested predictors Modeling	1	1572	January 9, 2018
Normal linear regression with known variance via rstanarm rstanarm	1	658	May 30, 2019

How to add weights and constrains to Multilevel model

HERE IS R CODE that uses lm() just for comparison

Related topics