# Problem With Partial Sum Function

Hi,

This post is a continuation of a previous post: Reduce_sum question - General - The Stan Forums (mc-stan.org).

Following the advice of @wds15, I am trying to use partial_sum to evaluate a density, which looks like this: \prod_{t=1}^{T}p(y_t|\mu_y)p(x_t|\mu_x), with \mu_y=Fx_t, and \mu_x=Gx_{t-1}.

I have two Stan programs that ought to perform the same analysis, with the only difference being that one uses partial_sum to evalutate the above density in 4 chunks, whereas the other evaluates it in one shot. Below I show how I evaluate the above in one go, without partial_sum;

model {
// Pre-compute array of mean vectors;
array[t_max]  vector[p] mu_y;
array[t_max]  vector[p] mu_x;

for(t in 1:t_max){

if(t == 1){

mu_x[t] = diag_matrix(Gmat) * x0;

}else{

mu_x[t] = diag_matrix(Gmat) * x[t-1];

}

mu_y[t] = Fmat * x[t];

}

target += MVPE_lpdf(x | mu_x, sigma_s, beta_s) + MVPE_lpdf(y | mu_y, diag_matrix(V), beta_o);
}


And below I show how I’m applying partial_sum to perform the above;

functions{
real partial_sum(vector[] y_slice,
int start,
int end,
vector[] x_slice,
matrix G,
matrix F,
matrix sigma_s,
matrix sigma_o,
real beta_s,
real beta_o,
vector x0){

// Precompute array of linear predictors;

array[end - start + 1] vector[ cols(F) ] mu_y;
array[end - start + 1] vector[ cols(G) ] mu_x;

int i = 1;

for(t in start:end ){

if(t == 1){

mu_x[i] = G * x0;

}else{

mu_x[i] = G * x_slice[(t-1)];

}

mu_y[i] = F * x_slice[t];

i = i + 1;

}

return MVPE_lpdf(x_slice[start:end] | mu_x, sigma_s, beta_s) + MVPE_lpdf( y_slice[start:end] | mu_y, sigma_o, beta_o);

}
.
. (not showing everything, for clarity)
.
model{
target += partial_sum(y,
1,
125,
x,
diag_matrix(Gmat),
Fmat,
sigma_s,
diag_matrix(V),
beta_s,
beta_o,
x0);

target += partial_sum(y,
126,
250,
x,
diag_matrix(Gmat),
Fmat,
sigma_s,
diag_matrix(V),
beta_s,
beta_o,
x0);

target += partial_sum(y,
251,
375,
x,
diag_matrix(Gmat),
Fmat,
sigma_s,
diag_matrix(V),
beta_s,
beta_o,
x0);

target += partial_sum(y,
376,
500,
x,
diag_matrix(Gmat),
Fmat,
sigma_s,
diag_matrix(V),
beta_s,
beta_o,
x0);
}


Since the last post, I have caught and corrected a discrepancy between my two Stan programs, and have looked at the two .stan files in WinDiff, and I can say with confidence that there is no difference between my two Stan programs, other than how they handle evaluating the above density.

This leads me to think that I’m making some kind of mistake in the above code chunk (the one with partial_sum) to make them produce different results for the same density–I fail to see how they could be providing different results, however.

For anyone so inclined: Attached is an R file that will produce fake data and fit both models to the same data–It does rely on two packages, one of which is cmdstanr. The samplers use the same seeds as well, yet results won’t be the same. Also attached are the .stan files: NG_SSM.stan is the model that evaluates the density all at once, and NG_SSM_redsum.stan is the model that uses partial_sum.
Simulation_and_ReduceSum_Comparison.R (2.9 KB)
NG_SSM_redsum.stan (4.9 KB)
NG_SSM.stan (3.4 KB)

I would be greatly appreciative if someone could point out any mistake(s) in the above code chunk, or assist me in troubleshooting.

Other details:
I’m using cmdstanr 0.6.0, cmdstan version 2.32.2 and Windows 10, 64 bit. check_cmdstan_toolchain returns The C++ toolchain required for CmdStan is setup properly!

1 Like

UPDATE:

I’ve tried to get a bit creative with comparing the two models I have.

First, I used log_prob() and grad_log_prob() to compare the gradient and log-probability of the two models, given the same set of unconstrained parameters. I believe the gradient of the two models are the same, within a certain degree of precision. There are 2546 parameters in my model, and there are 5 points of disagreement where the gradients are not the same–Althought these differences are quite small;

There is also a difference in the log-probability of each model;

Next I looked at the chain of 1 parameter, from each sampler, starting at warmup to see when/where the two samplers begin to diverge; The red cross hairs is a chain coming from the NG_SSM_redsum sampler–The one using partial_sum. The point at which the two samplers diverge is on the 27th iteration.

Lastly I looked at MAP estimates from each sampler, for three different sampler settings: iter = warmup = 100, 500, 1000. This was done for only a subset of parameters;

I like to think that MAP estimates tend to agree with one another with more warmup iterations/samples drawn.

From these results I have three questions:
Are the differences in the gradient and log-probability something that can be waved away/attributed to inaccuracies in how computers represent large/small numbers?

Is this a valid way of determining whether my two stan models are equivalent?

Most importantly; Do these results indicate that my two stan programs are calculating the above density (from my first post) the same way?

1 Like

Hi,
I think this is to be expected - it is good to check the original model and then its partial sum version, but the two versions generally should only provide the same (within Monte Carlo error) values for any expectation over the posterior (means, quantiles, …), but generally not numerically identical values for all the samples, even when starting from the same point.

Between the original model and the partial sum model, there is a change in the order of summation of the individual elements - whereas the original model would do something like add the contributions of each datapoint sequentially, the partial sum model would first compute the partial sums and then sum those together. In abstract algebra, this makes no difference, but actual floating point numbers are messy and this does not hold in general, especially if the indvidual contributions have different orders of magnitude.

A quick example in R:

options(digits = 22)
K <- 10
set.seed(556255)
x <- sort(rt(2 * K, df = 3))
sum(x)
#  -1.581380674831653232459
sum(x[1:K]) + sum(x[(K + 1):(2 * K)])
#  -1.581380674831653010415


The second contribution is that once you get even a single bit discrepancy between the chains, it will tend to increase over time as the chain forgets its initial state.

So to repeat: you should check that the partial sum model gives the same answers to questions about the model, but not that it produces bit-by-bit identical results.

Does that make sense?

Best of luck with your model!

3 Likes

Hi,

Thank you very much for sharing your thoughts/example with me–Crystal clear example, much obliged.

What I’ve gathered from your post is that in the end, it’s more important that we see similar MAP estimates and credible intervals with a high degree of overlap. In light of your example/explanation, this makes a lot more sense now.

Thank you again!

1 Like