I am sure this question has already been asked, but a search didn’t turn it up, so apologies. Anyway, my basic problem is that I want to fit a regression line (in this case, a series of connected straight-line segments) to a curve so that the ends match up with the observed data.
If the usual construction would be
Y ~ normal(X * beta, sigma)
then (at the moment) I have
Y ~ normal(X * beta, E * sigma)
where E is a vector along X which is 1 in the middle, and goes to 0 at the ends.
This does work, but not quite perfectly (Stan starts to complain, and the better the endpoints meet up, the more Stan complains), so I was wondering if there is a better, canonical, way to do this.
A possible trick might be to treat the unboserved latent states as parameters and declare the interior points as parameters, pass the boundaries as data, and use the transformed parameters block to append the boundary values.
Though, philosophically, this does beg the question that if theory strongly predicts the curve must perfectly match some boundary values and the observed data does not agree are the hard constraints necessary?
I’ve never seen anyone do this, so I doubt it. There’s a simple exact answer here. If you have n data points and sort it by x, then you achieve your stated goal exactly by taking the regression coefficients to be a slope of
\beta = (y_n - y_1) / (x_n - x_1)
and an intercept of
\alpha = y_1 - \beta \cdot x_1.
This exactly fits the endpoints of the data, but it ignores all the points in between.
May I ask why you want to specify an exact match to the observed data at the ends? For example, how is the data being generated? Is there some kind of measurement process that’s more accurate at the extremes?
I figured out a method - maybe not optimum but it works.
But I owe you an explanation, I think. The problem I have is as follows: I am trying to fit a line, that is almost a straight line, to a given final point.The line is almost but not quite linear. Essentially I want to model the residuals for the first order linear approximation. These residuals go necessarily to zero at the end points (which I know) - I want (need) to model the detailed structure of what comes in between.
That explanation made it clear for me. In that case I would use Gaussian processes and depending on the number of observations and data model should use either exact or Hilbert space approximated. I’m not going to details as you already solved it