Linear, parallell regression

Ole_Petter_Hansen · September 2, 2018, 1:20pm

Do you mean e.g. in a regression context, providing number of shards equal to number of observations? And then automatically combining them into the right size?

If that is possible, users might not need to worry about padding out missing values in shards. That will make map_rect quite a bit easier to use with real data.

wds15 · September 2, 2018, 9:37pm

Something like that. If the users gives to stan a lot of small jobs, we should be able to pack these cleverly together. I have never done this, but it sounds like a doable task to me.

jroon · September 2, 2018, 10:24pm

So I re-ran the linear model with a more granular spacing of shards to try to characterise the curves better. Here is the table of results and a plot:


time_in_ms	order	shards	time_linear	ratio

3	4	50	42,401	0.0001
24,834	4	100	42,401	0.586
18,222	4	200	42,401	0.430
23,384	4	300	42,401	0.551
20,959	4	400	42,401	0.494
47,334	4	500	42,401	1.116
60,061	4	600	42,401	1.416
22,590	4	700	42,401	0.533
31,672	4	800	42,401	0.747
33,686	4	900	42,401	0.794
61,607	4	1,000	42,401	1.453
34,010	4	1,100	42,401	0.802
26,374	4	1,200	42,401	0.622
83,719	4	1,300	42,401	1.974
35,303	4	1,400	42,401	0.833
37,058	4	1,500	42,401	0.874
56,689	4	2,000	42,401	1.337
31,483	4	2,500	42,401	0.743
95,567	4	3,000	42,401	2.254
88,839	4	4,000	42,401	2.095
262,206	4	5,000	42,401	6.184
333,721	5	50	646,038	0.517
442,397	5	100	646,038	0.685
82,083	5	200	646,038	0.127
201,167	5	300	646,038	0.311
113,416	5	400	646,038	0.176
683,850	5	500	646,038	1.059
85,219	5	600	646,038	0.132
589,495	5	700	646,038	0.912
94,406	5	800	646,038	0.146
481,697	5	900	646,038	0.746
379,561	5	1,000	646,038	0.588
86,849	5	1,100	646,038	0.134
194,543	5	1,200	646,038	0.301
498,565	5	1,300	646,038	0.772
185,828	5	1,400	646,038	0.288
467,989	5	1,500	646,038	0.724
131,470	5	2,000	646,038	0.204
122,695	5	2,500	646,038	0.190
661,582	5	3,000	646,038	1.024
159,023	5	4,000	646,038	0.246
141,625	5	5,000	646,038	0.219

Time vs shards:

Ratio vs shards:
test
There is alot of variation it seems

jroon · September 4, 2018, 9:42am

@Ole_Petter_Hansen - the tutorials from StanCon Helsinkin are online. The ODE tutorial included map_rect example:

Edit: on my laptop the demo ODE takes 499seconds in normal execution - 94 seconds in parallel.

avehtari · September 4, 2018, 10:15am

Thanks for reporting this!

jroon · September 4, 2018, 10:43am

Ran the ODE on the desktop just now also:
Serial model: 290 sec
Map_rect using only 1 logical core: 227 sec
Map_rect using all logical cores (24): 51 sec

jroon · September 7, 2018, 9:42pm

So guys, I’ve had a go at my own map_rect understanding the code a bit better. Alas I don’t understand it quite well enough. I have a multivariate normal linear model and since I want to work it on a big dataset with lots of X’s and Y’s, so I’m trying to map_rect it. Here is the model along with a script to generate some toy data and a script to run it:
gen_test_data_y2_x3.R (532 Bytes)

run_test_model.R (378 Bytes)

linear_multivariate_new.stan (1.1 KB)

The first point of confusion - it has a for loop in the model block and the transformed data block already so I was unsure which one to apply map_rect to. Because I thought it might be easier I tried to apply it to the model block. I also decided to simply call the map_rect for every row of data to avoid dealing with indexing and because for loops lend themselves to it. However, I’m now stuck on how to define x_r and x_i and confused. My best effort is here:

functions {
    vector mult_norm_par(vector mu_L, vector theta, real[] x_r, int[] x_i) {
        vector[NvarsY] = mu_L[1];
        matrix[NvarsY, NvarsY] L_Sigma = mu_L[2];
    
        real lp = multi_normal_cholesky_lpdf(y | mu, L_Sigma)
        return [lp]';
    }
}
data{
    int<lower=0> NvarsX; // num independent variables
    int<lower=0> NvarsY; // num dependent variables
    int<lower=0> N;     //Total number of rows

    matrix[N, NvarsX] x;
    vector[NvarsY] y [N];
}

parameters{
    cholesky_factor_corr[NvarsY] L_Omega; //For the correlations between outcome variables
    vector<lower=0>[NvarsY] L_sigma; //Residual SD of outcome variables

    vector[NvarsY] Beta0;         // Intercepts
    vector[NvarsX] Beta [NvarsY]; // Slopes
}

transformed parameters{
    vector[NvarsY] mu[N];
    matrix[NvarsY, NvarsY] L_Sigma;

    L_Sigma = diag_pre_multiply(L_sigma, L_Omega);

    for (i in 1:N){
        for (dv in 1:NvarsY){
            mu[i, dv] = Beta0[dv] + dot_product(to_vector(Beta[dv, 1:NvarsX]), x[i, 1:NvarsX]);
        }
    }
    
    // vars for map_rect
    vector theta[N];
    // x_r ??? confused
    // x_i ??? confused
}

model{
    //Priors
    L_Omega ~ lkj_corr_cholesky(1);
    L_sigma ~ cauchy(0,5);

    //for (i in 1:N){
    //    y[i, 1:NvarsY] ~ multi_normal_cholesky(mu[i, 1:NvarsY], L_Sigma);
    //}
    target +=sum(map_rect(  mult_norm_par,
                            append_row(mu, L_Sigma)),
	                        theta,
		                    x_r,
		                    x_i) );
}

generated quantities{
    matrix[NvarsY, NvarsY] Omega;
    matrix[NvarsY, NvarsY] Sigma;
    Omega = multiply_lower_tri_self_transpose(L_Omega);
    Sigma = quad_form_diag(L_Omega, L_sigma);
}

linear_multivariate_map_rect.stan (1.7 KB)

Any help appreciated!

jroon · September 8, 2018, 12:05am

Well I tried to fill in some of the missing bits but I’m having PARSER EXPECTED compile errors at line 4. Can’t figure out the solution.

functions {
    vector mult_norm_par(vector mu_L, vector theta, real[] x_r, int[] x_i) {
        int NvarsY = x_i[2];
        real[NvarsY] mu = mu_L[1];
        matrix[NvarsY, NvarsY] L_Sigma = mu_L[2];

        real lp = multi_normal_cholesky_lpdf(x_r | mu, L_Sigma)
        return [lp]';
    }
}
data{
    int<lower=0> NvarsX; // num independent variables
    int<lower=0> NvarsY; // num dependent variables
    int<lower=0> N;     //Total number of rows

    matrix[N, NvarsX] x;
    vector[NvarsY] y [N];
}

parameters{
    cholesky_factor_corr[NvarsY] L_Omega; //For the correlations between outcome variables
    vector<lower=0>[NvarsY] L_sigma; //Residual SD of outcome variables

    vector[NvarsY] Beta0;         // Intercepts
    vector[NvarsX] Beta [NvarsY]; // Slopes
}

transformed parameters{
    vector[NvarsY] mu[N];
    matrix[NvarsY, NvarsY] L_Sigma;

    L_Sigma = diag_pre_multiply(L_sigma, L_Omega);

    for (i in 1:N){
        for (dv in 1:NvarsY){
            mu[i, dv] = Beta0[dv] + dot_product(to_vector(Beta[dv, 1:NvarsX]), x[i, 1:NvarsX]);
        }
    }
    
    // vars for map_rect
    vector theta[N];
    real x_r[N, NvarsX];
    int x_i[N, 2];
    {
        for (n in 1:N) {
          x_r[n] = to_array_1d(y[n]);
          x_i[n, 1] = n;
          x_i[n, 2] = NvarsY;
        }
    }
}

model{
    //Priors
    L_Omega ~ lkj_corr_cholesky(1);
    L_sigma ~ cauchy(0,5);

    //for (i in 1:N){
    //    y[i, 1:NvarsY] ~ multi_normal_cholesky(mu[i, 1:NvarsY], L_Sigma);
    //}
    target +=sum(map_rect( mult_norm_par, append_row(mu, L_Sigma)), theta, x_r, x_i) );
}

generated quantities{
    matrix[NvarsY, NvarsY] Omega;
    matrix[NvarsY, NvarsY] Sigma;
    Omega = multiply_lower_tri_self_transpose(L_Omega);
    Sigma = quad_form_diag(L_Omega, L_sigma);
}

stijn · September 8, 2018, 4:53am

This line needs a ;

jroon · September 8, 2018, 7:28am

Thanks @stijn! Fixed that now but it doesn’t fix my error :(

Bob_Carpenter · September 10, 2018, 12:59pm

I don’t think that’ll ever be the case unless the model is very simple. The samplers don’t do much and the time to calculate the log density always dominates almot all of the computation time.

I wouldn’t expect that to work well.

To keep the processors busy, communication time can’t dominate. So 100K shards for 100K data points is not ideal. Also, just the number of CPU shards may not be ideal if they take different amounts of time to execute, as everything’s synchronized on getting the last result back.

There may also be page sizes having to do with memory communication, too.

There will be lots of hardware specific issues, like how fast memory and transfer is and how fast the CPUs are. With modern CPUs, if the whole CPU isn’t under a lot of load, they overclock them relative to multi-core performance (I think it’s called “turbo boost” :-))

Each level up you have to go the data (memory is beyond L3), means more latency getting into the CPU. I’m not sure what the ratio of transfer speeds are, but the modern CPUs are good at streaming data through if it’s memory local in memory. If we start paging in and out of cache in a way that’s not streaming memory local, then that slows down processors, too.

It’s going to depend on the model in terms of how much computation is done per shard. It’s keeping all the pipelines full that’s the key part. CPUs can’t do anything without data to work on.

Are those plots just for single runs?

We’re applying for some Amazon AWS grants to see just how well we can scale for various models. And we have some algorithm evals to do on hundreds of models, to boot.

Nice. Thanks for reporting. This is the use case we really built MPI for—large computations relative to the amount of data going back and forth.

jroon · September 10, 2018, 1:11pm

Yeah single runs. I have some deadlines coming up so I didnt’ get change to do multiple runs but I can do so if you wish later in the month.

No problem. I think I said somewhere above - I’m a novice at stan and more-so at map_rect - I’ve have difficulty to follow the models. But I have a 12core/24thread desktop that I’m generally happy to run some benchmarks on if its helpful to the effort and I’m not using it for work stuff.
I’m trying to learn this stuff but as I work on it alone its a tough learning curve. i’d be grateful for any advice re my model in post 67! :)

jroon · September 10, 2018, 4:35pm

So I’ve made some changes but I’m still running into problems with passing the variables into the mult_norm_par function. My code now looks like this:

functions {
    vector mult_norm_par(vector mu_L, vector theta, real[] x_r, int[] x_i) {
        int NvarsY = x_i[2];
        vector[NvarsY] mu = mu_L[1];
        matrix[NvarsY, NvarsY] L_Sigma = mu_L[2];

        real lp = multi_normal_cholesky_lpdf(x_r | mu, L_Sigma);
        return [lp]';
    }
}
data{
    int<lower=0> NvarsX; // num independent variables
    int<lower=0> NvarsY; // num dependent variables
    int<lower=0> N;     //Total number of rows

    matrix[N, NvarsX] x;
    vector[NvarsY] y [N];
}

parameters{
    cholesky_factor_corr[NvarsY] L_Omega; //For the correlations between outcome variables
    vector<lower=0>[NvarsY] L_sigma; //Residual SD of outcome variables

    vector[NvarsY] Beta0;         // Intercepts
    vector[NvarsX] Beta [NvarsY]; // Slopes
}

transformed parameters{
    vector[NvarsY] mu[N];
    matrix[NvarsY, NvarsY] L_Sigma;

    L_Sigma = diag_pre_multiply(L_sigma, L_Omega);

    for (i in 1:N){
        for (dv in 1:NvarsY){
            mu[i, dv] = Beta0[dv] + dot_product(to_vector(Beta[dv, 1:NvarsX]), x[i, 1:NvarsX]);
        }
    }
    
    // vars for map_rect
    vector theta[N];
    real x_r[N, NvarsX];
    int x_i[N, 2];
    {
        for (n in 1:N) {
          x_r[n] = to_array_1d(y[n]);
          x_i[n, 1] = n;
          x_i[n, 2] = NvarsY;
        }
    }
}

model{
    //Priors
    L_Omega ~ lkj_corr_cholesky(1);
    L_sigma ~ cauchy(0,5);

    //for (i in 1:N){
    //    y[i, 1:NvarsY] ~ multi_normal_cholesky(mu[i, 1:NvarsY], L_Sigma);
    //}
    target +=sum(map_rect( mult_norm_par, append_row(to_vector(mu[, 1:NvarsY]), L_Sigma)), theta, x_r, x_i) );
}

generated quantities{
    matrix[NvarsY, NvarsY] Omega;
    matrix[NvarsY, NvarsY] Sigma;
    Omega = multiply_lower_tri_self_transpose(L_Omega);
    Sigma = quad_form_diag(L_Omega, L_sigma);
}

When I run this I’m getting the error:

variable definition base type mismatch, variable declared as base type: vector variable definition has base: real error in ‘examples/map_rect/linear_multivariate_map_rect.stan’ at line 4, column 36

 2:     vector mult_norm_par(vector mu_L, vector theta, real[] x_r, int[] x_i) {
 3:         int NvarsY = x_i[2];
 4:         vector[NvarsY] mu = mu_L[1];
                                       ^
 5:         matrix[NvarsY, NvarsY] L_Sigma = mu_L[2];

I’'ve tried a few variations, but each time I’m running into a problem unpacking elements of mu_L

Bob_Carpenter · September 14, 2018, 2:55pm

mu_L is declared to be a vector. So mu_L[1] is a real value. You are trying to assign it to mu, which is delcared to be a vector. That’s not going to work. Same for the matrix. I don’t know what your intent is, though, so it’s hard to advise further.

jroon · September 14, 2018, 3:03pm

Hi Bob, thanks for your reply.

I’m trying to pass 1 row of mu from the transformed parameters block, and the L_Sigma for the multi_normal_choleskpy_lpdf function in the map_rect function. I’m trying to do this by calling map_rect with append_row(to_vector(mu[, 1:NvarsY]), L_Sigma)) assigned as the mu_L variable in the mult_norm_par function. I was trying to emulate the approach taken by Ole in his earlier code: Linear, parallell regression

Have I misunderstood something?

Bob_Carpenter · September 14, 2018, 3:39pm

I can’t speak to that, but the code you pasted has the problem I mentioned—you can’t assign mu_L[1] (a scalar) to mu (a vector). Stan requires vectors to be assigned to vectors, scalars to scalars, etc.

jroon · September 14, 2018, 3:46pm

Ok. So then my point of misunderstanding here is the mu_L[1] is a scalar. I thought it should be a vector. So I’m calling making assigning mu_L as append_row(to_vector(mu[, 1:NvarsY]), L_Sigma)).

Is mu_L[1] not equal to the first element of that-i.e.: to_vector(mu[, 1:NvarsY]). This is what I imagined I was doing in any case.

Bob_Carpenter · September 14, 2018, 4:09pm

mu_L[1] is the first element of the vector mu_L. When you declare vector mu_L, it’s saying that mu_L is a vector. Did you mean to delcare mu_L as an array of vectors? That’s vector[].

jroon · September 14, 2018, 4:29pm

Aha! Yes ok that was my point of confusion so! Thank you very much. So that fixed the error on that line.

If I might get your help to clarify this point too think I will be ok -> now I’m getting an error on the next line:
matrix[NvarsY, NvarsY] L_Sigma = mu_L[2];

So this is throwing an error because I am attempting to pass a matrix to a vector (this error I was expecting). L_sigma is a triangular matrix. How can I collapse that to a vector to pass into map_rect, and then reconstruct into the correct triangular matrix ?

I can guess that I could do to_vector(L_sigma) in the call to map_rect, but how then do I correctly reconstruct L_sigma from a vector in my user defined function? NvarsY, the dimensions of L_sigma, is already passed to the function. I don’t know two things here:

If I do to_vector does that collapse a matrix by row or by column?
If I extract that vector from `mu_L[2]’, how to I reshape it back into L_sigma ?

jroon · September 14, 2018, 4:50pm

Actually - I think I figured it out @Bob_Carpenter. I found the function reference manual and learned that to_vector is column-wise, and I can use the function to_matrix to rebuild a matrix, columnwise.

Thanks for your help. I’m off to figure out the next error on my own if I can :)

Topic		Replies	Views
MPI shard scaling General	5	1133	May 28, 2019
Parallel Options for Independent Regressions Modeling	4	537	April 28, 2021
Help with multi-threading a linear regression model and slicing the design matrix Modeling specification , matrix , paralellization	11	1542	November 20, 2021
Parallelising matrix multiplications Modeling	10	512	May 4, 2023
Computational Efficiency: advanced techniques? Modeling	1	365	October 4, 2022

Linear, parallell regression

Related topics