How to Optimize a Function of X in Stan?

I have been trying to figure out how to include a function f of X in a regression model Y ~ f(X) with Stan (specifically through R) and I think I found something that works (in the sense that it doesn’t fail :))

In this case, the adstock function is a transformation of X with a single parameter called ‘rate’ below. Is placing this in “transformed parameters” the proper place within the Stan program?

data {
  int<lower=0> N;   // number of observations
  real x[N];       // x
  real Y[N];       // outcome
  
}

parameters {
  real<lower=0.0> b0;          // intercept
  real b1;                              // slope on x 
  real<lower=0.0> sigma;     // sd of error

  real <lower=0.0, upper=1.0> rate; //adstock rate
}

transformed parameters {
  
  matrix[N, 1] adstock_x; //hold new adstock x
  real mu[N];             // linear predictor

//adstock transformation
  adstock_x[1,1]=x[1]; 
  for (i in 2:N)
  {
    adstock_x[i,1]=x[i]+rate*adstock_x[i-1,1];
  }

//linear predictor
  for (n in 1:N) {
   mu[n] = b0+b1*adstock_x[n,1];
  }


}

model {
  
  //likelihood 
    Y ~ normal(mu, sigma);

   // priors
      b0 ~ normal(1500,500);
      b1 ~ normal(0,30);
      sigma ~normal(1500,1000);
      rate ~ uniform(0,1);

}

That or in the model block, depending on whether you want to keep it in the output.

Great, thanks for confirming, I wonder if I can ask a follow-up…since I just started watching your video lectures :)

If I fit this model Y~ B0 + B1*adstock(X,rate) with nonlinear least squares, I get quite different results than with the code above in Rstan even though I am setting what I think are priors which allow plenty of room to settle on the same estimates as from the nls fit.

In sample at least, the fit is better with nls than Rstan. This lead me to question if my code was correct.

Is it unreasonable to think that the results should be roughly the same? If not should this be considered differences in the algorithms?

I can post the full code and the different results if helpful.

Thanks!

The least squares fit is an optimization, it finds by definition the fit that gives the least squared residual.

Any given sample from the posterior must by definition have either the same or greater squared residual. The posterior shows the whole range of things that are plausible based on your model, the least-squares shows the one point where the squared in-sample residual comes to its lowest value.

So unless you’re seeing wildly different results, it probably is normal.

I am comparing the mean from the Stan fit to the least squares optimization. You can see the rate parameter for example is drastically different. This made me wonder if I was actually fitting the model incorrectly in Stan.

The least squares looks like this:

adstock_loop<-function(x,rate)
{
  adstock_x<- vector(mode="numeric", length=length(x))
  adstock_x[1]<-x[1]

  for (i in 2:length(x))
  {
    adstock_x[i]<-x[i]+rate*adstock_x[i-1]
  }
  
  return(adstock_x)
}


summary(
    nls( Y~b0 + b1*adstock_loop(x,rate),data=dat,
          start     = c(b0=1,b1=1, rate=0.1)
          ,control = list(maxiter=500)
      )
    )

Results:

Parameters:
      Estimate Std. Error t value Pr(>|t|)    
b0   1399.2043   443.4172   3.156  0.00274 ** 
b1      3.4488     1.6995   2.029  0.04788 *  
rate    0.8196     0.1051   7.800 3.86e-10 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1597 on 49 degrees of freedom

The RStan model:

stan_str_ad=
'

functions {
  


} // end functions


data {
  int<lower=0> N;   // number of observations
  real x[N];       // x
  real Y[N];       // outcome
  
}

parameters {
  real<lower=0.0> b0;          // intercept
  real b1;                   // slope on x 
  real<lower=0.0> sigma;       // sd of error

  real <lower=0.0, upper=1.0> rate; //adstock rate
}

transformed parameters {
  
  matrix[N, 1] adstock_x; //hold new adstock x
  real mu[N];             // linear predictor

//adstock transformation
  adstock_x[1,1]=x[1]; 

  for (i in 2:N)
  {
    adstock_x[i,1]=x[i]+rate*adstock_x[i-1,1];
  }

//linear predictor
  for (n in 1:N) {
   mu[n] = b0+b1*adstock_x[n,1];
  }


}

model {
  
  //likliehood 
    Y ~ normal(mu, sigma);
   // priors
   //b0 ~ student_t(3,0,1);
   //b1 ~ student_t(3,0,1);
   //sigma ~student_t(3,0,1);
   //rate ~ uniform(0,1);

      b0 ~ normal(1500,500);
      b1 ~ normal(0,30);
      sigma ~normal(1500,1000);
      rate ~ uniform(0,1);


}
'
fit2 <- stan(model_code = stan_str_ad, 
            data = list(N=nrow(dat),Y=dat$Y, x=dat$x), 
            warmup = 1000, iter = 2000, chains = 4, cores = 4, thin = 1)


summary(fit2,par=c('rate', 'b0','b1','sigma'))$summary

And these results:

well, hard to say if that’s so dramatically different or not. the mean is 0.656 and the sd is 0.202 whereas the least squares says 0.82 ± .1,

.656+.202 = .858 ~ 0.820

given your priors provide a certain bias upwards for b0 and downward for b1, and given the size of the stdev in the posterior and the stderr in the least squares estimate… it looks more like imprecision than something going wrong.

This seems like it is just adding an interaction between x and its lag, in which case I would QR the design matrix but other than that, it seems like it should be doable.

Its a cumulative decayed function. adstock

What do you think about the estimate from Stan versus nls? Not a sign of an issue if you cant get closer on average (and the issue is just precision)?

Any idea how to code it for panel data?