Linear, parallell regression

ermeel · September 15, 2018, 5:29am

Thanks. @Ole_Petter_Hansen and/or @wds15 what would you do if N above in the bl_glm example is not divisible my shards?

Ole_Petter_Hansen · September 15, 2018, 6:45am

Then you need explicitly allow for that in the code.

If you have a vector x=c(1,2,3,4,5), an you want two shards you pad out the last shard with something (e.g. x1=c(1,2,3), x2=c(4,5,99).

In order to keep track of real data in each shard you need to pass an int var to the shards. E.g. n_obs1=3, n_obs2=2, so you only use the relevant portions of the data. This is described (without an example, though) in the manual.

Izi pizi with only a vector, but requires some housekeeping with lots of data/parameter sizes. @wds15 mentioned Stan could maybe help out a bit with this housekeeping in the future.

ermeel · September 15, 2018, 6:49am

Thanks. I was also thinking to actually do the remaining instances/rows without threading or MPI directly “inline” in the model block instead of using map_rect. Would that cause any huge performance issue? These should be at most shards-1 rows…

wds15 · September 15, 2018, 7:46am

If your # of shards do not line up, then map_rect will handle that for you. What it will do is place the excess number of jobs onto the second node onwards. This means that the root node will have one job less than the others. It is better to get the root node to a little less since it is then available to handle the incoming results from all workers.

jroon · September 15, 2018, 11:06am

Is there a way to flatten an array of vectors to a vector ?

This line:

target +=sum(map_rect( mult_norm_par, append_row(to_vector(mu[, 1:NvarsY], to_vector(L_Sigma))), theta, x_r, x_i) );

is giving me this error:

No matches for: 

  to_vector(vector[])
Available argument signatures for to_vector:

  to_vector(matrix)
  to_vector(vector)
  to_vector(row vector)
  to_vector(real[])
  to_vector(int[])

  error in 'examples/map_rect/linear_multivariate_map_rect.stan' at line 65, column 79
  -------------------------------------------------
    63:     //    y[i, 1:NvarsY] ~ multi_normal_cholesky(mu[i, 1:NvarsY], L_Sigma);
    64:     //}
    65:     target +=sum(map_rect( mult_norm_par, append_row(to_vector(mu[, 1:NvarsY]), to_vector(L_Sigma)), theta, x_r, x_i) );
                                                                                      ^
    66: }
  -------------------------------------------------

Thanks

Bob_Carpenter · September 19, 2018, 11:57am

Not built in. You’d need to write your own function for that.

jroon · September 19, 2018, 4:47pm

Thanks Bob. I have tried this and failed:

    vector re_pack(vector[] mu, int NvarsY){
        vector[1] newvec[NvarsY];
        for(y in 1:NvarsY){
            newvec[1, y] = mu[y];
        }
        return[newvec];
    }

Produces the error:

  error in 'examples/map_rect/linear_multivariate_map_rect.stan' at line 14, column 13
  -------------------------------------------------
    12:         vector[1] newvec[NvarsY];
    13:         for(y in 1:NvarsY){
    14:             newvec[1, y] = mu[y];
                    ^
    15:         }
  -------------------------------------------------

PARSER EXPECTED: "}"

I can’t figure out why this error happens. The ‘}’ look balanced to me!?!

Bob_Carpenter · September 20, 2018, 11:40am

The key thing that’s going wrong in all of this is the assignment of mismatched sizes. Each index knocks a size/dimension off.

Because mu is declared as vector[] mu, it’s an array of vectors. So mu[y] is a vector.

Because newvec is declared as an array of vectors, newvec[1, y] is a scalar.

That error message should be saying that the assignment is wrong and not that it expected a }. Can you post your whole program? It may be a bug in our error reporting.

In general, you probably shouldn’t be making vector[1] data types. That’s just a vector with a single element, which is probably better off being a scalar.

I can’t tell what you’re trying to accomplish with this code, but vector[1] is almost never a good thing—it just adds a layer of container for a one-element vector.

Also, you can’t return[newvec] — the square brackets are for array indexing, so it’s like you’re trying to treat return as some kind of container. This should probably be return newvec.

When I run with the latest RStan, the error I see is the right one:

SYNTAX ERROR, MESSAGE(S) FROM PARSER:

Base type mismatch in assignment; variable name = newvec, type = real; right-hand side type=vector
  error in 'model14c18563e2e9d_foo' at line 5, column 21
  -------------------------------------------------
     3:     vector[1] newvec[NvarsY];
     4:     for(y in 1:NvarsY){
     5:       newvec[1, y] = mu[y];
                            ^
     6:     }
  -------------------------------------------------

PARSER EXPECTED: <expression assignable to left-hand side>
Error in stanc(file = file, model_code = model_code, model_name = model_name,  : 
  failed to parse Stan model 'foo' due to the above error.

jroon · September 20, 2018, 11:58am

Hi Bob. Thanks for coming back to me. I’m the first to admit I’m confused here, but I’m doing my best. Stan is hard to learn!

FYI -I’m working in CmdStan 2.18

To bring things back into focus because the thread might be confused now. I’m trying to reformulate this linear multivariate model to run with map_rect:
linear_multivariate_new.stan (1.1 KB)
This will generate some toy data: gen_test_data_y2_x3.R (532 Bytes)
This will run the model: run_test_model.R (378 Bytes)

This is my current map_rect version from which the vector code above comes: linear_multivariate_map_rect.stan (2.0 KB)

There likely are several problems here - I’ve been trying to work out the kinks one at a time.
The reason I wrote the function re_pack is because this:

has been giving me the error:

No matches for: 

  to_vector(vector[])

So I am trying to repackage up one row of mu and with vectorised L_sigma to pass into the map_rect function.

FYI - I’m sure the map_rect function probably also has issues, but so far I’ve not been able to run it to find out what they are!

Bob_Carpenter · September 20, 2018, 12:08pm

You might want to try some more basic programming first, then. It’s hard to jump into the deep end of a complicated project and it looks like you’re having a lot of trouble with the basic typing concepts. Each variable in Stan has a shape and only things of like shapes can be assigned. Every function has a signature saying what shape of arguments it accepts.

Only functions listed in the appendix of the manual exist. That’s why you’re not finding to_vector(vector[]). If you want that function, you’ll need to implement it yourself.

I can’t understand the goal of the function you’re trying to write, so I can’t suggest the right way to code it. I’m afraid I don’t have time to debug your program from scratch.

Topic		Replies	Views
MPI shard scaling General	5	1147	May 28, 2019
Parallel Options for Independent Regressions Modeling	4	539	April 28, 2021
Help with multi-threading a linear regression model and slicing the design matrix Modeling specification , matrix , paralellization	11	1573	November 20, 2021
Parallelising matrix multiplications Modeling	10	539	May 4, 2023
Reduce_sum parallelisation issue Modeling cmdstanr , multivariate-normal	12	1035	February 24, 2022

Linear, parallell regression

Related topics