I believe the plan is that the next version of rstan after 2.21 will be 2.24 and that will have stanc3.
Oh sorry Iâm talking about code like
vector[N] mu = Intercept + r_1_1[J_1] .* Z_1_1;
That in the C++ getâs translated to
assign(
mu, nil_index_list(),
add(Intercept,
elt_multiply(
rvalue(r_1_1, cons_list(index_multi(J_1), nil_index_list()),
"r_1_1"),
Z_1_1)),
"assigning variable mu");
One of the more expensive parts of that is the line
rvalue(r_1_1, cons_list(index_multi(J_1), nil_index_list()), "r_1_1")
Since J_1
in a vector of integers thatâs essentially a random access copy operation. If we are lucky that index is like
{1, 2, 3, 4, 5, 6}
but unless itâs the dataâs primary index it will probably be something like
{1, 2, 12, 14, 43, 44, ...}
This is one place sparse matrices would be v nice, since then we could just have Z_1_1 be a
sparse_matrix[N, K]
and we would just do an efficient sparse matrix product
vector[N] mu = Z_1_1 * r_1_1

and we would just do an efficient sparse matrix product
Oh okay the thing to keep in mind is they are very similar. Iâm not sure there is a way to reorder these matrices outside of the unique groupings for the intercept we talked about previously.
So this:
Z_1_1 .* r_1_1[idxs]
is a sparse matrix multiply where we know each row of the sparse matrix has exactly one element.
So maybe the variables look like this:
idxs = { 0, 1, 1, 1, 2 }
Z_1_1 = { 1.0, 2.2, 5.5, 7.7, 8.8 }
r_1_1 = { 3.1, 2.0, 1.7 }
r_1_1 is just a vector, but the sparse matrix representation for idxs
/Z_1_1
is (using the notation from Wikipedia):
V = { 1.0, 2.2, 5.5, 7.7, 8.8 }
COL_INDEX = [ 0, 1, 1, 1, 2 ]
ROW_INDEX = [ 0, 1, 2, 3, 4, 5 ]
So the sparse version of this isnât giving us much other than an extra array of integers (ROW_INDEX
).
I hope I got that right but if I didnât here is the dense form of the matrix:
[ 1.0, 0, 0,
0 , 2.2, 0,
0 , 5.5, 0,
0 , 7.7, 0,
0 , 0 , 8.8]
If you target brms⌠then maybe also consider the recent changes due to reduce_sum
which hopefully play well with the things you outline. Basically this means that working on slices of the data and parameters is also working fast.

r_1_1 is just a vector, but the sparse matrix representation for
idxs
/Z_1_1
is (using the notation from Wikipedia):V = { 1.0, 2.2, 5.5, 7.7, 8.8 } COL_INDEX = [ 0, 1, 1, 1, 2 ] ROW_INDEX = [ 0, 1, 2, 3, 4, 5 ]
So the sparse version of this isnât giving us much other than an extra array of integers
Yes but for the data having it as a sparse matrix is just a cost we pay once to construct that tho where for
Z_1_1 .* r_1_1[idxs]
We make that temporary r_1_1[idxs]
vector in every iteration of the model (and have to pay the random access copy cost). If we had sparse data and did
data {
int K;
integer[K] n_nz;
integer[K] m_nz
// size N x M with nonzero's at (n_nz, m_nz)
sparse_matrix[N, M, n_nz, m_nz] Z_1_1;
}
//....
mu += Z_1_1 * r_1_1;
Thatâs just a sparse x dense multiply which should be more efficient (and eigen also has an openmp backend that it can use for sparse dense multiplies that we could use)
do we have a specific release date, code freeze, RC etc?
for my part, Iâd like to do as much as possible towards documenting how to troubleshoot the install/upgrade process by adding stuff to the online CmdStan docs that can be linked to in the release notes, etc. I opened an issue in August - feel free to keep contributing suggestions: https://github.com/stan-dev/docs/issues/268
The plan is to Tag today the rc.

do we have a specific release date, code freeze, RC etc?
Yes, everything as usual and according to the dates mentioned above.
There was not much action in Stan/cmdstan this cycle (6 non CI PRs alltogether) so we only have a Math release issue: https://github.com/stan-dev/math/issues/2128 and a stanc3 issue
If there will be no objections, @serban-nicusor can start making RCs tomorrow CET time.
Hey, Iâve done the RCs for math, stan and cmdstan ( including stanc3 nightly binaries ).
You can find them here: math, stan. cmdstan.
I did not include the release notes in github so we donât have them again on the release next week.
Thanks!
@rok_cesnovar ⌠will you announce on the forum so that people can test?
I did run my usual test model (mixture logistic regression) on macOS. Compared to 2.24.1 I am getting a ~12% slowdown with 2.25.0rc1 which reduces to less than 1% if I turn on STAN_COMPILER_OPTIMS=true
.
Here is Stan model and data:
blrm.stan (28.4 KB) combo3.data.R (3.9 KB) test.R (1.4 KB)
Itâs still odd to require the optimisations for good performance given this model, but ok, whatever.

@rok_cesnovar ⌠will you announce on the forum so that people can test?
Steve volunteered to do the RC forum post this time. @stevebronder, are you still up for that? Else I can do it if you are busy. Let me know. It would be great to get that out today. Tuesday will be here soon.

Compared to 2.24.1 I am getting a ~12% slowdown with 2.25.0rc1 which reduces to less than 1% if I turn on
STAN_COMPILER_OPTIMS=true
.
Yeah, we can still rethink turning this on by default. The freeze period is meant just for that. We have a week just for testing.
We kind of decided this on a hurry, based on make issues that turned out being something small (thanks for debugging that Sebastian).
Could stanc3 receive a rc tag as well? httpstan needs a specific version of stanc3 to download.
Also, it doesnât seem like cmdstanâs version is particularly well defined since it doesnât download a specific version of stanc3. It downloads whatever stanc3 is tagged ânightlyâ, I think. Thatâs a moving target. (See https://github.com/stan-dev/cmdstan/issues/923 for some background.)

Could stanc3 receive a rc tag as well? httpstan needs a specific version of stanc3 to download.
No need to make new cmdstan tarballs I think.

Also, it doesnât seem like cmdstanâs version is particularly well defined since it doesnât download a specific version of stanc3.
This is not completely correct.
Release cmdstan tarballs come with the stanc3 binaries and thus does not download any stanc3 binary on build. It will use the release stanc3 no mattter how many times the users bulids/cleans.
The nightly binary is downloaded only if you use a clone of cmdstan. This is fine, as develop is always a moving target and should thus always use the latest version of stanc3, same as it always uses the latest Math.
The only other way is if you go and manually delete the stanc3 binaries in the release (make clean-all is not enough, you have to really know which files to remove manually). This is not something one would do normally or at all.

See https://github.com/stan-dev/cmdstan/issues/923 for some background.)
This issue fixed a different problem. If you went back in git commits to for example 2.23, you had no way of downloading the 2.23 stanc3 (besides manually downloading it). You can now with @syclikâs fix.
Hey, here you can find stanc3 v2.25.0-rc1. As soon as the jenkins build will finish you will also find all the binaries attached on the release page.

Release cmdstan tarballs come with the stanc3 binaries and thus does not download any stanc3 binary on build. It will use the release stanc3 no mattter how many times the users bulids/cleans.
I didnât know this was the case. Is there a different tar.gz for each platform (i.e., with platform-specific stanc3). Or will the different stanc3s all come in the same tarball?
For now they all come in the same tarball. Its only a few MBs wasted.
We might have to consider splitting this once macOS on ARM becomes popular. That will require separate binaries for macOS ARM and macOS x86
We technically should already do that for Linux ARM and Windows ARM systems, though those are not even remotely widespread (so probably not worth it) as the new macbooks will become next year.
Got it. So the way one learns which version of stanc3 is associated with
a specific release of cmdstan is to execute the binary and discover the
version? The version number is not contained anywhere in the release
tarball nor in the repository tree associated with the tagged commit?
To get the stanc3 version you need to do
Make build
./bin/stanc âversion
The version is also in the tarball name.
Itâs still odd to require the optimisations for good performance given this model, but ok, whatever.
### 2.25 without optims
real 186.59
user 185.90
sys 0.55
### 2.25 with optims
real 174.98
user 174.32
sys 0.50
### 2.24.1
real 171.72
user 171.04
sys 0.56
Edit: Nvm about the native thing I didnât recompile the model after running the one with optimizations