How to Optimize a Function of X in Stan?

b_miner · March 15, 2019, 8:28pm

I have been trying to figure out how to include a function f of X in a regression model Y ~ f(X) with Stan (specifically through R) and I think I found something that works (in the sense that it doesn’t fail :))

In this case, the adstock function is a transformation of X with a single parameter called ‘rate’ below. Is placing this in “transformed parameters” the proper place within the Stan program?

data {
  int<lower=0> N;   // number of observations
  real x[N];       // x
  real Y[N];       // outcome
  
}

parameters {
  real<lower=0.0> b0;          // intercept
  real b1;                              // slope on x 
  real<lower=0.0> sigma;     // sd of error

  real <lower=0.0, upper=1.0> rate; //adstock rate
}

transformed parameters {
  
  matrix[N, 1] adstock_x; //hold new adstock x
  real mu[N];             // linear predictor

//adstock transformation
  adstock_x[1,1]=x[1]; 
  for (i in 2:N)
  {
    adstock_x[i,1]=x[i]+rate*adstock_x[i-1,1];
  }

//linear predictor
  for (n in 1:N) {
   mu[n] = b0+b1*adstock_x[n,1];
  }


}

model {
  
  //likelihood 
    Y ~ normal(mu, sigma);

   // priors
      b0 ~ normal(1500,500);
      b1 ~ normal(0,30);
      sigma ~normal(1500,1000);
      rate ~ uniform(0,1);

}

bgoodri · March 15, 2019, 8:29pm

That or in the model block, depending on whether you want to keep it in the output.

b_miner · March 15, 2019, 10:08pm

Great, thanks for confirming, I wonder if I can ask a follow-up…since I just started watching your video lectures :)

If I fit this model Y~ B0 + B1*adstock(X,rate) with nonlinear least squares, I get quite different results than with the code above in Rstan even though I am setting what I think are priors which allow plenty of room to settle on the same estimates as from the nls fit.

In sample at least, the fit is better with nls than Rstan. This lead me to question if my code was correct.

Is it unreasonable to think that the results should be roughly the same? If not should this be considered differences in the algorithms?

I can post the full code and the different results if helpful.

Thanks!

dlakelan · March 15, 2019, 10:16pm

The least squares fit is an optimization, it finds by definition the fit that gives the least squared residual.

Any given sample from the posterior must by definition have either the same or greater squared residual. The posterior shows the whole range of things that are plausible based on your model, the least-squares shows the one point where the squared in-sample residual comes to its lowest value.

So unless you’re seeing wildly different results, it probably is normal.

b_miner · March 15, 2019, 10:41pm

I am comparing the mean from the Stan fit to the least squares optimization. You can see the rate parameter for example is drastically different. This made me wonder if I was actually fitting the model incorrectly in Stan.

The least squares looks like this:

adstock_loop<-function(x,rate)
{
  adstock_x<- vector(mode="numeric", length=length(x))
  adstock_x[1]<-x[1]

  for (i in 2:length(x))
  {
    adstock_x[i]<-x[i]+rate*adstock_x[i-1]
  }
  
  return(adstock_x)
}


summary(
    nls( Y~b0 + b1*adstock_loop(x,rate),data=dat,
          start     = c(b0=1,b1=1, rate=0.1)
          ,control = list(maxiter=500)
      )
    )

Results:

Parameters:
      Estimate Std. Error t value Pr(>|t|)    
b0   1399.2043   443.4172   3.156  0.00274 ** 
b1      3.4488     1.6995   2.029  0.04788 *  
rate    0.8196     0.1051   7.800 3.86e-10 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1597 on 49 degrees of freedom

The RStan model:

stan_str_ad=
'

functions {
  


} // end functions


data {
  int<lower=0> N;   // number of observations
  real x[N];       // x
  real Y[N];       // outcome
  
}

parameters {
  real<lower=0.0> b0;          // intercept
  real b1;                   // slope on x 
  real<lower=0.0> sigma;       // sd of error

  real <lower=0.0, upper=1.0> rate; //adstock rate
}

transformed parameters {
  
  matrix[N, 1] adstock_x; //hold new adstock x
  real mu[N];             // linear predictor

//adstock transformation
  adstock_x[1,1]=x[1]; 

  for (i in 2:N)
  {
    adstock_x[i,1]=x[i]+rate*adstock_x[i-1,1];
  }

//linear predictor
  for (n in 1:N) {
   mu[n] = b0+b1*adstock_x[n,1];
  }


}

model {
  
  //likliehood 
    Y ~ normal(mu, sigma);
   // priors
   //b0 ~ student_t(3,0,1);
   //b1 ~ student_t(3,0,1);
   //sigma ~student_t(3,0,1);
   //rate ~ uniform(0,1);

      b0 ~ normal(1500,500);
      b1 ~ normal(0,30);
      sigma ~normal(1500,1000);
      rate ~ uniform(0,1);


}
'
fit2 <- stan(model_code = stan_str_ad, 
            data = list(N=nrow(dat),Y=dat$Y, x=dat$x), 
            warmup = 1000, iter = 2000, chains = 4, cores = 4, thin = 1)


summary(fit2,par=c('rate', 'b0','b1','sigma'))$summary

And these results:

dlakelan · March 15, 2019, 10:57pm

well, hard to say if that’s so dramatically different or not. the mean is 0.656 and the sd is 0.202 whereas the least squares says 0.82 ± .1,

.656+.202 = .858 ~ 0.820

given your priors provide a certain bias upwards for b0 and downward for b1, and given the size of the stdev in the posterior and the stderr in the least squares estimate… it looks more like imprecision than something going wrong.

bgoodri · March 16, 2019, 1:38am

This seems like it is just adding an interaction between x and its lag, in which case I would QR the design matrix but other than that, it seems like it should be doable.

b_miner · March 16, 2019, 1:47am

Its a cumulative decayed function. adstock

What do you think about the estimate from Stan versus nls? Not a sign of an issue if you cant get closer on average (and the issue is just precision)?

Antyteza · March 18, 2019, 10:44am

Any idea how to code it for panel data?

Topic		Replies	Views
Optimizing an external function of the predictors brms	3	552	March 22, 2019
Are transformed parameters included in the search space for optimizations? Algorithms optimization	1	653	January 7, 2019
Optimizing Stan Code to estimate variable transformation parameters Modeling rstan , performance , hierarchical-model	3	1306	August 1, 2020
Compilation error when using complex transformed parameters Modeling techniques , specification	3	636	November 29, 2017
Optimizing Functions of stan random variables General	7	1122	March 31, 2021

How to Optimize a Function of X in Stan?

Related topics