How to set the number of shards as an input?

ignacio · March 5, 2019, 6:05pm

Richard McElreath’s multithreadign and map-reduce tutorial has the number of shards fixed to 7 (similar to example in section 22.3 of the Stan user guide). That is


functions {
  vector lp_reduce( vector beta , vector theta , real[] xr , int[] xi ) {
    int n = size(xr);
    int y[n] = xi[1:n];
    int m[n] = xi[(n+1):(2*n)];
    real lp = binomial_logit_lpmf( y | m , beta[1] + to_vector(xr) * beta[2] );
    return [lp]';
  }
} 

data {
  int N;
  int n_redcards[N];
  int n_games[N];
  real rating[N];
}

transformed data {
  // 7 shards
  // M = N/7 = 124621/7 = 17803
   int n_shards = 7;
  int M = N/n_shards;
  int xi[n_shards, 2*M];  // 2M because two variables, and they get stacked in array
  real xr[n_shards, M];
  // an empty set of per-shard parameters
  vector[0] theta[n_shards];
  // split into shards
  
  for ( i in 1:n_shards ) {
    int j = 1 + (i-1)*M;
    int k = i*M;
    xi[i,1:M] = n_redcards[ j:k ];
    xi[i,(M+1):(2*M)] = n_games[ j:k ];
    xr[i] = rating[j:k];
  }
}

parameters {
  vector[2] beta;
}

model {
  beta ~ normal(0,1);
  target += sum( map_rect( lp_reduce , beta , theta , xr , xi ) );
}

I’m trying to rewrite his example to have n_shards be an input that you can pass in the data section. How do you do that such that if N is not a multiple of n_shards the first (or last yard) has the extra observations?

stemangiola · March 6, 2019, 1:03am

I ended up:

defining a matrix with dimension = to the max(number of observations per shard)
replacing NA with 0s
creating an array with the number of real observation per shard (e.g., c(10, 10, 10, 9,9,9))
passing that array to map_rect and using it to read just the data you actually have

Pretty laborious, but now I don’t have to worry anymore. I hope that was not too confusing.

ignacio · March 6, 2019, 1:08am

@stemangiola do you have code that you can share? Do you do all this within stan?

stemangiola · March 6, 2019, 1:43am

functions{

	 vector lp_reduce( vector global_parameters , vector local_parameters , real[] xr , int[] xi ) {
	 	int M = xi[1];
	 	int N = xi[2];
	 	int S = xi[3];
	 	int G_per_shard = xi[4];
	 	int symbol_start[M+1] = xi[(4+1):(4+1+M)];
	 	int sample_idx[N] = xi[(4+1+M+1):(4+1+M+1+N-1)];
	 	int counts[N] = xi[(4+1+M+1+N):size(xi)];


	 	vector[G_per_shard] lambda_MPI = local_parameters[1:G_per_shard];
	 	vector[G_per_shard] sigma_MPI = local_parameters[(G_per_shard+1):(G_per_shard*2)];
	 	vector[S] exposure_rate = local_parameters[((M*2)+1):rows(local_parameters)];

	 	vector[G_per_shard] lp;


	 for(g in 1:G_per_shard){
	 	lp[g] =  neg_binomial_2_log_lpmf(
	 	  counts[symbol_start[g]:symbol_start[g+1]-1] |
	 	  exposure_rate[sample_idx[symbol_start[g]:symbol_start[g+1]-1]] +
	 	  lambda_MPI[g],
	 	  sigma_MPI[g]
	 	 );
	 }


    return [sum(lp)]';
  }

}
[...]

transformed data {
  vector[0] global_parameters;
  real xr[n_shards, 0];

  int<lower=0> int_MPI[n_shards, 4+(M+1)+N+N];

  // Shards - MPI
  for ( i in 1:n_shards ) {
  int M_N_Gps[4];
  M_N_Gps[1] = M;
  M_N_Gps[2] = N;
  M_N_Gps[3] = S;
  M_N_Gps[4] = G_per_shard[i];

  int_MPI[i,] = append_array(append_array(append_array(M_N_Gps, symbol_start[i]), sample_idx[i]), counts[i]);

  }
}

The above might be confusing because I had a ‘non squared’ dataset for both the two dimension so this is a “tidy” data frame that can handle all kind of data availability combinations, but in your case element per shards is the only non square strudture

Well I decided to create the main data frame outside stan and append it with indexes I needed inside

ignacio · March 12, 2019, 2:33pm

I wrote the following code with the idea that the last shard will have more observations and the other shards will be padded with zeros. Alas, i’m getting some syntax errors that i’m not sure how to solve:

functions {
  vector lp_reduce( vector beta , vector theta , real[] xr , int[] xi ) {
    int n = size(xr);
    int y[n] = xi[1:n];
    int m[n] = xi[(n+1):(2*n)];
    real lp = binomial_logit_lpmf( y | m , beta[1] + to_vector(xr) * beta[2] );
    return [lp]';
  }
} 

data {
  int N;
  int n_redcards[N];
  int n_games[N];
  real rating[N];
  int n_shards;
}

transformed data {
  int modulo = N % n_shards;
  int n_per_shard = N/n_shards;
  int n_per_shard_plus_modulo = n_per_shard + modulo;
  int xi[n_shards, 2*n_per_shard_plus_modulo];  
  real xr[n_shards, n_per_shard_plus_modulo];
  // an empty set of per-shard parameters
  vector[0] theta[n_shards];
  // split into shards
  {
   int pos= 1;
   //Shards 1 to n_shards -1 (these ones are padded with zeros)
   for ( i in 1:(n_shards-1) ) {
    int end = pos + n_per_shard - 1;
    xr[i,1:end] = rating[pos:end];
    xr[i,((end+1):(end+modulo))] = rep_array(0.0, modulo); 
    
    xi[i,1:end] = n_redcards[pos:end];
    xi[i,((end+1):(end+modulo))] = rep_array(0.0, modulo);
    
    xi[i,((end+modulo+1):(2*end))] = n_games[pos:end];
    xi[i,((2*end+1):(2*end+modulo))] = rep_array(0.0, modulo);
    pos += end
    }
    // last shard (this one has no padding)
    end = pos + n_per_shard - 1;
    xr[n_shards,(1:(end+modulo))] = rating[pos:end];
    xi[n_shards,(1:(end+modulo))] = n_redcards[pos:end];
    xi[n_shards,((end+modulo+1):(2*(end+modulo)))] = n_games[pos:end];
  }
}

parameters {
  vector[2] beta;
}

model {
  beta ~ normal(0,1);
  target += sum( map_rect( lp_reduce , beta , theta , xr , xi ) );
}

SYNTAX ERROR, MESSAGE(S) FROM PARSER:

Info: integer division implicitly rounds to integer. Found int division: N / n_shards
 Positive values rounded down, negative values rounded up or down in platform-dependent way.
  error in 'modelab3798a622_stan_ab6535c317' at line 34, column 17
  -------------------------------------------------
    32:     int end = pos + n_per_shard - 1;
    33:     xr[i,1:end] = rating[pos:end];
    34:     xr[i,((end+1):(end+modulo))] = rep_array(0.0, modulo); 
                        ^
    35:     
  -------------------------------------------------

PARSER EXPECTED: ")"
Error in stanc(file = file, model_code = model_code, model_name = model_name,  : 
  failed to parse Stan model 'stan-ab6535c317' due to the above error.
Error in stanc(file = file, model_code = model_code, model_name = model_name,  : 
  failed to parse Stan model 'stan-ab6535c317' due to the above error.

How can I solve the syntax errors?
Is this implementation correct or am I missing something?

Thanks!

ignacio · April 22, 2019, 7:29pm

Answer to my own question:

functions {
  vector lp_reduce( vector beta , vector theta , real[] xr , int[] xi ) {
    int n = size(xr);
    int y[n] = xi[1:n];
    int m[n] = xi[(n+1):(2*n)];
    real lp = binomial_logit_lpmf( y | m , beta[1] + to_vector(xr) * beta[2] );
    return [lp]';
  }
} 

data {
  int N;
  int n_redcards[N];
  int n_games[N];
  real rating[N];
  int n_shards;
}

transformed data {
  int modulo = N % n_shards;
  int n_per_shard = N/n_shards;
  int n_padded = n_per_shard + modulo;
  int s_pad = n_per_shard + 1;
  int s_games = n_padded + 1;
  int e_games = s_games + n_per_shard - 1;
  int s_pad_games = e_games+1;
  int e_pad_games = 2*n_padded;

  
  int xi[n_shards, 2*n_padded];  // 2M because two variables, and they get stacked in array
  real xr[n_shards, n_padded];
  // an empty set of per-shard parameters
  vector[0] theta[n_shards];
  
  // split into shards
  int pos = 1;
   //Shards 1 to n_shards - 1 (these ones are padded with zeros)
   for ( i in 1:(n_shards-1) ) {
    int end = pos + n_per_shard - 1;

    xr[i,1:n_per_shard] = rating[pos:end];
    xr[i,s_pad:n_padded] = rep_array(0.0, modulo); 
    
    xi[i,1:n_per_shard] = n_redcards[pos:end];
    xi[i,s_pad:n_padded] = rep_array(0, modulo);
    
    xi[i,s_games:e_games] = n_games[pos:end];
    xi[i,s_pad_games:e_pad_games] = rep_array(0, modulo);
    pos = end + 1;
    }
    
    // last shard (this one has no padding)
    xr[n_shards,1:n_padded] = rating[pos:N];
    
    xi[n_shards,1:n_padded] = n_redcards[pos:N];
    xi[n_shards,s_games:e_pad_games] = n_games[pos:N];
}

parameters {
  vector[2] beta;
}

model {
  beta ~ normal(0,1);
  target += sum( map_rect( lp_reduce , beta , theta , xr , xi ) );
}

stemangiola · April 24, 2019, 2:28am

Not that I would make a huge difference, but I believe this distribution of observations

10 10 9 9 9 9 9 9

is more efficient in principle than

10 10 10 10 10 10 10 4

Topic		Replies	Views
MPI shard scaling General	5	1147	May 28, 2019
Help with multi-threading with random effects Modeling	0	357	April 24, 2019
Reduce_sum: choosing how to split data across shards Modeling performance	3	621	May 7, 2020
Multithreading and memory usage Developers	5	504	March 11, 2023
Reduce_sum cores, chains, threads Interfaces cmdstanr	13	1793	May 28, 2020

How to set the number of shards as an input?

Related topics