# Reduce_sum error?

Hi. I am getting strange `reduce_sum` errors in a very simple model running cmdstanpy.

Here is a pared down model that reproduces the error:

``````
real reducesum_lpmf(int[] y, vector omega, int[] m) {

int num_categories = size(y);

real second_part = 1.0;
print(num_categories, " ", size(m));
return second_part;

}

real partial_sum(
int[ , ] selected_slice,
int start,
int end,
data int[ , ] incount,
vector omega
)
{
real ret_sum = 0;
print("start = ",start, " end = ",end);
for (n in start:end)
{
print(size(selected_slice[n]), " ", size(incount[n]));
ret_sum += reducesum_lpmf(selected_slice[n] | omega, incount[n]);
}
return ret_sum;

}

}

data{
int<lower=1> num_obs; // number of observations
int<lower=1> num_categories; // number of categories
int<lower=0> selected[num_obs,num_categories]; // number selected  data
int<lower=0> incount[num_obs,num_categories]; // how many of each color

int<lower=0, upper=1> run_estimation;
}

parameters {
simplex[num_categories] omega;  // the inferred weights of the Multivariate Wallenius Non-Central Hypergeometric Distribution
}

model {
int grainsize = 1;
omega ~ dirichlet(rep_vector(1.0/num_categories, num_categories));

if (run_estimation==1){
target += reduce_sum(partial_sum, selected, grainsize, incount, omega);
}
}
``````

(this is a cutdown version of another model where I initially saw the error).

So the issue is when I compile the above model with:

``````from cmdstanpy import CmdStanModel
#rs_model = CmdStanModel(stan_file="test_rs.stan")
``````

and run it with

``````rs_fit = rs_model.sample(chains=2,
iter_warmup=250,
iter_sampling=250,
data=stan_data,
show_progress='notebook',
output_dir='./data',
)
``````

I get nonsensical indices for my `selected_slice` sizes.

An example subset of 2-stdout.txt output:

``````method = sample (Default)
sample
num_samples = 250
num_warmup = 250
save_warmup = 0 (Default)
thin = 1 (Default)
engaged = 1 (Default)
gamma = 0.050000000000000003 (Default)
delta = 0.90000000000000002
kappa = 0.75 (Default)
t0 = 10 (Default)
init_buffer = 75 (Default)
term_buffer = 50 (Default)
window = 25 (Default)
algorithm = hmc (Default)
hmc
engine = nuts (Default)
nuts
max_depth = 10 (Default)
metric = diag_e (Default)
metric_file =  (Default)
stepsize = 1 (Default)
stepsize_jitter = 0 (Default)
id = 2
data
file = /var/folders/77/nm2qxy_j7z90dhx6bhhvw47h0000gn/T/tmpdkku523n/90okg_sp.json
init = 2 (Default)
random
seed = 98851
output
file = test_rs-202009150959-2.csv
diagnostic_file =  (Default)
refresh = 100 (Default)

start = 1 end = 1
10 10
10 10
start = 2 end = 2
1828068222379318548 10
-602317548 10
start = 3 end = 3
16349936537123897489 10
-1552727919 10
start = 4 end = 4
0 10
0 10
start = 5 end = 5
18446744073709551536 10
-80 10
start = 6 end = 6
1120149264 10
1120149264 10
start = 7 end = 7
18446744073709542896 10
-8720 10
start = 8 end = 8
18446744072635809792 10
-1073741824 10
start = 9 end = 9
18446744073709551552 10
-64 10
start = 10 end = 10
18446744073172811776 10
-536739840 10
start = 11 end = 11
3518391462204210
-457466790 10
start = 12 end = 12
18446708889794561840 10
457099056 10
start = 13 end = 13
0 10
0 10
start = 14 end = 14
853600405611371915 10
-593994357 10
start = 15 end = 15
16601505965085912400 10
2038000976 10
``````

It appears that the indexing into `reduce_sum` is wrong.

Any ideas?

I am running macosx 10.15.6 with CmdStan version 2.24.1 and cmdstanpy = 0.9.63

I didnâ€™t look closely at your script.

It looks like you send your post directly to me instead of to the forum? Why?

Hi,

selected_slice is indexed from 1 to (end-start+1) as each thread only gets a slice (hence the slice argument), incount on the other hand is 1 to num_obs*num_categories.

2 Likes

OOOOOOH! Okay, so

`incount` is indexed from `start:end`, but selected_slice is indexed from `1:end-start+1`.

THANK YOU!

(this has been driving me nuts for a week)

Hey Joshua, I donâ€™t think I sent it directly to you?

Not exactly. Lets say both `incount` and `selected_slice` have size N. Each thread always gets the entire `incount` array, so you index that from 1 to N. So `incount` behaves the same way as you would expect if this was a regular user-defined function.

For the `selected_slice` its a bit different. The slice argument is sliced, meaning that each call to this user-defined function only gets a chunk of M elements of the `selected_slice` array. Inside the function those are indexed from 1 to M (and not 1 to N, as you are using the chunk not the entire array - this is the problem you are experiencing).

Start and end define the start and end index of the chunk of selected_slice each thread received.

Here is a quick sketch that shows slices for two threads.

Hopefully that clears it up a bit.

Maybe have a look at

For explanations and some strategies to get in steps to a working program.

1 Like

So to clarify, if I were to cast the example problem from https://mc-stan.org/docs/2_23/stan-users-guide/reduce-sum.html:

``````functions {
real partial_sum(int[] y_slice,
int start, int end,
vector x,
vector beta) {
return bernoulli_logit_lpmf(y_slice | beta[1] + beta[2] * x[start:end]);
}
}
``````

into a for loop, would I write it like:

``````functions {
real partial_sum(int[] y_slice,
int start, int end,
vector x,
vector beta) {
real ret_sum = 0;
for (n in start:end) {
ret_sum +=  bernoulli_logit_lpmf(y_slice[n-start+1] | beta[1] + beta[2] * x[n]);
}
return ret_sum;
}
``````

right?

I am not sure I follow here.

In any case, if your model can be cast into a brms model, then maybe just do that and grab the reduce_sum branch from the brms github location. This will generate a working reduce_sum Stan model for you which you can modify further if needed. Be warned that this is ongoing development and not released yet.

Hi Daniel,

Yep, thatâ€™s the correct indexing for looping within `reduce_sum`.

Cheers,
Andrew

1 Like

I suggest adding this indexed example to the docs. There are a lot of likelihoodâ€™s that are not vectorized.