Is it valid to define likelihood on partial data?

I have implemented the double exponential smoothing model

data {
  int<lower=3> n;
  vector[n] y;
  int<lower=0> h;
}
parameters {
  real<lower=0, upper=1> alpha;
  real<lower=0, upper=1> beta;
  real<lower=0> sigma;
}
transformed parameters {
  real l;
  real lp;
  real b;
  vector[n] mu;

  lp = y[1];
  l = lp;
  b = y[2] - y[1];
  
  mu[1] = y[1];
  mu[2] = y[2];
  
  for (t in 2:(n-1)) {
    l = alpha * y[t] + (1 - alpha) * (lp + b);
    b = beta * (l - lp) + (1 - beta) * b;
    mu[t+1] = l + b;
    lp = l;
  }
}
model {
  for (t in 3:n)
    y[t] ~ normal(mu[t], sigma);
}

Is the definition of likelihood based just on part of the data valid? It seems to work smoothly, but I’d like to know if it won’t lead to any possible problems on Stan’s side?

Is the definition of likelihood based just on part of the data valid? It seems to work smoothly, but I’d like to know if it won’t lead to any possible problems on Stan’s side?

Hmm, doesn’t have to. But through the recursive definition of b, all the mus do kinda depend on all the ys, so y[1] and y[2] are technically being used.

Just looks like the model here doesn’t do anything about estimating the means of y[1] and y[2], which seems perfectly reasonable. (edit: removed estimating, since mu aren’t themselves parameters exactly)

Your Stan program defines the likelihood p(y[3:n] | alpha, beta, sigma).

A typical thing to do here is to have parameters for mu[1:2],

parameters {
  vector[2] mu12;
  ...
transformed parameters {
  mu[1:2] = mu12;
  ...
model {
  y ~ normal(mu, sigma);
}

That will define a likelihood p(y[1:n] | mu12, alpha, beta, sigma). Like the rest of your model, this will produce an improper uniform prior. Stan is OK with that as long as the posterior is proper, but we still usually recommend at least weakly informative priors to avoid tail probability mass that’s inconsistent with knowledge of the problem.

Thanks, I like this solution. But, do I understand it correctly that mine solution is formally correct, but what you suggest is more elegant?

Your method defines the likelihood for some of the data conditional on other data (actually, I should’ve put y[1:2] on the right of the vertical bar in what I wrote). That’s OK. What I wrote is something different that also provides a likelihood for the whole input (after you marginalize out mu[1:2].

It’s OK (and common) to do what you do.