I am using Stan to solve the following estimation problem. I have a vector S that represents the superposition of a finite but unknown number of templates (Ti):

S = (sum over i) bi * Ti + n

bi are non-negative ([0…Max]) unknown parameters. bi is initialized to a vector of random normals with mean that is close to a typical expected answer and a STD that of +/- mean. n is assumed to be Gaussian noise.

The typical size of the vector S is 32K, and the typical number of templates (range of i) is 100. (each template is of size 32K).

If I reduce the size of the vector to 1024 on simulated data, Stan generates the estimate quickly (less than 10 minutes). With my larger vector, Stan does not finish even in days.

It is not a resource problem because my machine with 32 cores and 256GB of memory is never close to fully utilized.

Here is an abbreviated version of how I set this up.

```
functions
{
real dotproduct_final_lpdf(real answer, int LENGTH , int final_size, vector scale, matrix templates, vector signal )
{
vector [2*final_size] scales;
vector [2*LENGTH] scaled_temps;
vector [2*LENGTH] M_ST;
vector [LENGTH] final_m_ct_1;
vector [LENGTH] final_m_ct_2;
vector [LENGTH] scaled_temps_1;
vector [LENGTH] scaled_temps_2;
vector [final_size] scale2;
real Dot_1part = 0 ;
real Dot_2part = 0 ;
real Dot_product = 0 ;
real result =0;
real M_cT_norm =0;
scale2 = rep_vector(0.0, final_size);
scales = append_row(scale,scale2);
scaled_temps = templates*scales;
M_ST = signal - scaled_temps ;
final_m_ct_1 = head(M_ST,LENGTH);
final_m_ct_2 = tail(M_ST,LENGTH);
scaled_temps_1 = head(scaled_temps,LENGTH);
scaled_temps_2 = tail(scaled_temps,LENGTH);
M_cT_norm = sqrt ( dot_product(final_m_ct_2, final_m_ct_2) +dot_product(final_m_ct_1, final_m_ct_1) ) ;
final_m_ct_2 = (1/M_cT_norm) * final_m_ct_2 ;
final_m_ct_1 = (1/M_cT_norm) * final_m_ct_1 ;
Dot_1part = dot_product(scaled_temps_2, final_m_ct_2) + dot_product(scaled_temps_1 , final_m_ct_1) ;
Dot_2part = dot_product (scaled_temps_1, final_m_ct_2) -dot_product (final_m_ct_1 , scaled_temps_2 );
Dot_product = sqrt (Dot_1part*Dot_1part +Dot_2part*Dot_2part );
result = 1-fabs(Dot_product);
return result;
}
}
data
{
int is_it_final;
real scale_ratio2 ;
int LENGTH;
int final_size;
matrix [2*LENGTH,2*final_size] templates ;
vector [2*LENGTH] signal ;
vector [final_size] scales_mat_est ;
vector [final_size] scale2_mat_est ;
}
parameters
{
vector <lower = 0.000000001 > [final_size] scale;
real scale_final;
}
model
{
scale ~ normal(scales_mat_est ,scales_mat_est* 2.0 );
scale_final ~ dotproduct_final(LENGTH,final_size,scale,templates,signal);
}
```