I tag here the authors of the inherent paper@paul.buerkner@avehtari and @gabriuma.
I struggle building the matrix \mathbb{S} of the D-tuples as in Section 3.2 of the article (page 5), equation 10.
I would do it internally in the transformed data section because the number of dimension D is fixed by the input data and I define Ma priori.
For example when D=2 (2D gaussian process) and M=15 (number of basis functions), I wrote \mathbb{S} as:

array[to_int(M^2), 2] int S;
for (i in 1:M) {
S[((i-1)*M+1):M*i, 1] = rep_array(i, M);
S[((i-1)*M+1):M*i, 2] = linspaced_int_array(M, 1, M);
}

however I can’t get a rationale for generalize the loop for D dimensions.
Moreover, when is it useful to have a different number of basis function {m_1, m2, \dots, m_D } for each dimension?

Just a note that brms will generate code that might be instructive. Use brms::make_stancode on any model including a term gp(x1, x2, ..., xn, k, c) and supplying the arguments k and c to gp(). To minimize complexity, it might also be helpful to specify the argument scale = FALSE.

thanks @jsocolar , but brms gave me no insight because the \mathbb{S} matrix is passed to Stan as data, with the Xgp_1 matrix being pre-built. For my model I would like to take control of the different lengthscales with, consequently a different number of optimal basis functions along each dimension (exactly as reported in the paper).
So, long story short: as usual, I post here the code I wrote to reproduce the example of the HSGP paper in section 3.2. I hope this will be useful to anyone interested.

int D = 3;
array[D] int m = {2, 2, 3};
array[D] int position = rep_array(1, D);
array[prod(m), D] int S; // D-tuples
for (i in 1:prod(m)) {
for (j in 1:D) {
S[i, j] = position[j];
}
position[D] += 1;
for (j in 1:(D-1)) {
int reverse_j = D - j + 1;
if (position[reverse_j] > m[reverse_j]) {
position[reverse_j] = 1;
position[reverse_j - 1] += 1;
}
}
}