Persistent Windows Issues with RStudio

I continue to encounter issues in my courses where some Windows computers crash RStudio when running models like that included below. The problems do not arise on Mac or Linux, or PyStan on Windows. They seem to be limited to Windows 10, but having Windows 10 is not sufficient. Machines with 16 GB of RAM (and no other processes starving the system of memory) still crash. One possible commonality is Windows 10 Enterprise?

I have two system specs for machines that were causing problems if that might help.

Any thoughts?

data {
  int<lower=1> N; // Number of observations
  
  int<lower=1> N_location; // Number of location groups
  int<lower=1, upper=N_location> location[N]; // Location group assignments

  int<lower=1> N_method; // Number of method groups
  int<lower=1, upper=N_method> method[N]; // Method group assignments
  
  vector[N] x; // Covariates
  int y[N];    // Binary variates
}

parameters {
  real mu_alpha;             // Location intercepts population mean
  real<lower=0> sigma_alpha; // Location intercepts population standard deviation
  vector[N_location] alpha_location_tilde; // Non-centered location intercepts

  real mu_beta; // Location slopes population mean
  real<lower=0> sigma_beta; // Location slopes population standard deviation
  vector[N_location] beta_location_tilde; // Non-centered location slopes
}

transformed parameters {
  // Recentered intercepts for each location group
  vector[N_location] alpha_location = mu_alpha + sigma_alpha * alpha_location_tilde;
  // Recentered slopes for each location group
  vector[N_location] beta_location = mu_beta + sigma_beta * beta_location_tilde;
}

model {
  mu_alpha ~ normal(0, 0.5);           // Prior model
  sigma_alpha ~ normal(0, 2);       // Prior model
  alpha_location_tilde ~ normal(0, 1); // Non-centered hierarchical model

  mu_beta ~ normal(0, 0.5);           // Prior model
  sigma_beta ~ normal(0, 2);       // Prior model
  beta_location_tilde ~ normal(0, 1); // Non-centered hierarchical model

  // Observational model
  y ~ bernoulli_logit(beta_location[location] .* x + alpha_location[location]);
}
2 Likes

Could you share a minimal example that fails including the data? Or does it crash on compile? I have access to a few Windows 10 machines, enterpise/education & pro. Maybe we can replicate it.

1 Like

It compiles but crashes sometime during runtime.

Try the below using the Stan program I pasted above with the attached data file.

library(rstan)
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())

data <- read_rdump('multilevel_logistic_regression.data.R')
fit <- stan(file='one_level_ncp.stan', data=data, seed=4938483,
            control=list(adapt_delta=0.85), chains=1)

multilevel_logistic_regression.data.R (4.2 KB)

I get similar behaviour on some machines when I have -march=native in my makevars.

1 Like

Good call – the heterogeneity may very well be controlled by the processor type and hence tickle a -march=native issue.

I tried it on a Windows 10 Enterprise machine with an i5 CPU. I am running R 3.6.1. and RTools 3.5. It worked with and without the march flag. Will investigate on other machines. Do post the configurations, they might help track this down.

I also had RStudio crashing when using the -march=native and calling multi_normal().

I had the crash with this model too on Windows 10 Pro (Version 1903) with R 3.6.1, rstan 2.19.2, StanHeaders 2.18.1-10, RStudio 1.2.1555. and Rtools version 3.5.0.4

CXX14=$(BINPREF)g++ -O2 -march=native -mtune=native
CXX14FLAGS=-O3 -march=native -mtune=native
CXX11FLAGS=-O3 -march=native -mtune=native
1 Like

But not the particular function I included above?

I think this issue comes down to the Rtools package using a version of the mingw compiler that was built before some of the latest generation Intel processors. On my machine, which uses Skylake processors, I had to use -march=Broadwell to get models to compile. Has anyone tried using experimental Rtools 4.0?

1 Like

I was able to finnally reproduce this on a third machine with an i7 CPU (also got nothing on an AWS instance with Windows Server). Its a fresh install of R 3.6.1, Rstudio and RTools and with -march=native set.

Sometimes (almost always after after a Rstudio restart) it displayed 1/2000 and crashed the R session. Upon restarting the R session (but not RStudio) it ran fine. Is this like what you were seeing @betanalpha ?

One thing I am seeing is that -march=native causes the model to run for twice as long when it doesnt crash (1.4 seconds comapred to 0.7s withotut the flag). So that might already be a sign of some troubles.

EDIT: no issue without the flag

Tested your function now and it did not crash.

But, I was not able to recreate the crash I had few weeks ago either. Since then I’ve updated both R (3.5.x to 3.6.1) and Stan (2.18.x to 2.19.2). I think my Windows version (17763), Rstudio (1.2.1335) and Rtools (3.5.0.4) are still the same.

The “run for 1/2000 then crash” was definitely observed, but those people didn’t get anything to work by restarting.

Looks like the -march=native flag is definitely the culprit. Thanks everyone!

1 Like

Do you have ‘-march=native’ in your makevars?

On Windows (PyStan) mingw-w64 compiler would crash sometimes (divide-by-zero, and maybe some other function where output should have been nan).

Have you tried the latest dev version for RTools?

I did try it both with ‘-march=native’ and without. Did not crash when I run it yesterday. But tried it again today and now it crashes every time when using the ‘-march=native’!

Attaching my program below that also usually will crash when ‘-march=native’. When rewriting it to use multi_normal_cholesky instead if multi_normal it does work with ‘-march=native’.

library(rstan)

Sys.setenv(LOCAL_CPPFLAGS = '-march=native')
#Sys.setenv(LOCAL_CPPFLAGS = '-O3 -march=native -mtune=native')

code = '
data {
  int<lower=0> N; 
  matrix[N, 4] xy;
  real r1; 
  real r2; 
  real r3; 
  real r4; 
}

transformed data {
  matrix[N*2, N*2] Omega = diag_matrix(rep_vector(1., N*2));
  vector[N*2] xy_mu = append_row(xy[, 1], xy[, 2]);
  vector[N*2] xy_sigma = append_row(xy[, 3], xy[, 4]);
  
  Omega = [
    [1,  r1, r2, r2,    r3, r4, r4, r4],
    [r1, 1,  r1, r2,    r4, r3, r4, r4],
    [r2, r1, 1,  r1,    r4, r4, r3, r4],
    [r2, r2, r1, 1,     r4, r4, r4, r3],
    
    [r3, r4, r4, r4,    1,  r1, r2, r2],
    [r4, r3, r4, r4,    r1, 1,  r1, r2],
    [r4, r4, r3, r4,    r2, r1, 1,  r1],
    [r4, r4, r4, r3,    r2, r2, r1, 1]];
    
  print("Omega "); for (i in 1:N*2) print(Omega[i, ]);

}

parameters {
  vector[N*2] xy_par;
}

model {
  xy_par ~ multi_normal(xy_mu, quad_form_diag(Omega, xy_sigma));
}
'

xy = matrix(c(
  c(-20, -10, 0, 17),  # x_mu
  c(60, 52, 43,  18),  # y_mu
  c(1, 1, 1, 1),       # x_sigma
  c(1, 1, 1, 1)        # y_sigma
  ), ncol=4)

model <- stan_model(model_code = code)
fit_xy <- sampling(model, chains = 1, seed=65445, 
                   data=list(N=4, xy=xy, r1=0.0, r2=0.0, r3=-0.0, r4=-0.0))
1 Like

Will it crash if you set init to something reasonable and no warm-up?

Yes it does. Here is simpler program that often reproduces the crash. But, sometimes it does not crash. The only factor I’m sure about is the ‘-march=native’.

library(rstan)

Sys.setenv(LOCAL_CPPFLAGS = '-march=native')

code = '
data {
  int<lower=0> N;
  int<lower=0> K;
  matrix[N, K] y;
}

parameters {
  vector[K] mu;
  vector<lower=0.0>[K] sigma;
  corr_matrix[K] Omega;
}

model {
  Omega ~ lkj_corr(2);
  for (i in 1:N) {
    y[i, ] ~ multi_normal(mu, quad_form_diag(Omega, sigma));
  }
}
'

N = 100
K = 4
mu = 1
sigma = 1

y = MASS::mvrnorm(n = N, mu=rep(mu, K), Sigma=diag(rep(sigma, K)))

model <- stan_model(model_code = code)
fit <- sampling(model, chains = 1, seed=65445, data=list(N=N, K=K, y=y))
1 Like