# Normal_sufficient signature

Since normal_sufficient has already been merged to stan-dev, I think now is a good time to talk about this.

The thing is that normal_sufficient has the following signature:

``````normal_sufficient_lpdf(const T_y& y_bar, const T_s& s_squared,
const T_n& n_obs, const T_loc& mu,
const T_scale& sigma)
``````

So, the straightforward way is to expose it as:

``````y_bar ~ normal_sufficient(s_squared, n_obs, mu, sigma);
normal_sufficient_lpdf(y_bar | s_squared, n_obs, mu, sigma);
``````

But the problem with that is that s_squared and n_obs are also data, so it would be better to have it as something like:

``````(y_bar, s_squared, n_obs) ~ normal_sufficient(mu, sigma);
normal_sufficient_lpdf(y_bar, s_squared, n_obs | mu, sigma);
``````

Has this problem happened before with another distribution? How feasible is this in terms of parser and stuff?

It’s not so much about data vs parameters, but rather on what are you conditioning. The “straightforward” way that you note is the correct statistical description – `y_bar` follows a Gaussian distribution conditioned on `s_squared`, `n_obs`, `mu`, and `sigma`.

But isn’t more accurate to say that

(y_bar, s_squared, n_obs) follows a Gaussian distribution conditioned on mu, and sigma_

since the whole (the sufficient statistic) vector (y_bar, s_squared, n_obs) is a function of data (y_1, y_2, …, y_n)?

Yes, it’s reasonable for `s_squared` to be on the left, but `n_obs` is definitely not data as you know how many elements there are ahead of time! The challenge with putting more than one variable on the left of the bar is that we don’t have any convention for something like `(y_bar, s_squared) ~ sufficient_normal(mu, sigma, N);`.

We could write this:

``````[y_bar, s_squared]' ~ sufficient_normal(mu, sigma, N);
``````

if we assume the signature is

``````sufficient_normal_lpdf(vector, real, real, int);
``````

I think we may need to rename this, too. We’ve gone with distribution first and parameterization later. So shouldn’t this be `normal_sufficient_lpdf` rather than `sufficient_normal_lpdf`? Or is it more like `lognormal_lpdf`?

@betanalpha Yes, you are right, `n_obs` is not part of statistic, just `(y_bar, s_squared)`.

@Bob_Carpenter This would be a possible solution, although it makes it a little more complex on the side the user, specially for when `y_bar` and `s_squared` already are vector (probably very unlikely to happen with `sufficient_normal`), but that’s why we have `append_col` anyway. But then, would we document the distribution as multivariate?

Also note that the current name already is `normal_sufficient_lpdf`.

It’d have to be multivariate if that were the case. The real question is what normalizes to a proper density? That’ll determine when users need to apply jacobians and should also detemrine how we decide what goes to the left of the vertical bar.

Yes, since `[Y_bar, S_squared]` is a statistic, therefore a function of data `[Y_1, Y_2, ..., Y_n]`, then they must integrate to 1 given the true parameters `mu` and `sigma` (obviously provided we adjust the jacobian of the lpdf just like any change of variable of we do).

But then, I think it’s not worth it to vectorize `[y_bar, s_squared]` like was done for `multi_normal` (what I mean is, the function should accept only a single vector `[y_bar, s_squared]`, not a std::vector of Eigen vector like `multi_normal`), since if that’s the case the user can (and should) add more samples using some straigtfoward algebra like `y_bar_new = (y_bar_1*n1 + y_bar2*n2)/(n1+n2)`.