Syntactical variants on cmdstanr, mostly for learners

Hello cmdstanr fans!

As you may know, I love using cmdstanr for teaching newcomers to Stan and Bayes in general. But I find a couple of syntactical things that throw some of them. It’s really a question of what one is used to. Some find the OOP-style methods strange and it’s little bit of unnecessary cognitive load for them. I mean mymodel$sample(), for example, rather than sample(mymodel…). Another is that those who like to pipe R code left-to-right can’t do so.

My personal preferences are to get as close to functional programming as i can, though I think it’s an ideal and a state of mind rather than a product one can use. I therefore don’t like methods and I don’t like pipes. However, I just want to make it easier for beginners.

SO… I played very briefly with defining new functions (see below) that would be alternatives to the methods $sample() and $draws() and it works ok. But is there any reason why I should be cautious in doing this? Is there something deep in cmdstanr that will trip up learners using this? I’m especially hoping for some insights from @mitzimorris here.

library(cmdstanr)
library(bayesplot)
library(magrittr)
set.seed(1234)

passign <- function(x, name) {
	assign(name, x, pos=1)
	return(x)
}
cmdstan_sample <- function(model, ...) {
	model$sample(...)
}
cmdstan_draws <- function(fitted, ...) {
	fitted$draws(...)
}
cmdstan_summary <- function(fitted, ...) {
	fitted$summary(...)
}

standata <- list(n=100, x=rnorm(100, 10, 2))

stancode <- '
data {
	int n;
	array[n] real x;
}
parameters {
	real mu;
	real<lower=0> sigma;
}
model {
	mu ~ normal(0, 10);
	sigma ~ normal(0,2);
	x ~ normal(mu, sigma);
}
'


######### pipe version #########

stancode %>% write_stan_file() %>%
             cmdstan_model(compile=TRUE) %>%
             cmdstan_sample(data=standata,
                            seed=123,
                            chains=2,
                            parallel_chains=2,
                            iter_warmup=1000,
                            iter_sampling=2000) %>%
             passign("stansamples") %T>%
             cmdstan_summary() %>%
             cmdstan_draws() %>%
             passign("standraws") %>%
             mcmc_trace() 
             

######### non-pipe version ########

stanmod <- cmdstan_model(write_stan_file(stancode), compile=TRUE)

stansamples <- cmdstan_sample(stanmod,
                              data=standata,
                              seed=123,
                              chains=2,
                              parallel_chains=2,
                              iter_warmup=1000,
                              iter_sampling=2000)

cmdstan_summary(stansamples)

standraws <- cmdstan_draws(stansamples)

mcmc_trace(standraws)

Of course, you don’t have to pipe this, and you can use the more recent “native R pipe”. One thing that’s probably worth saying is that beginners need to learn each step individually, and in particular need to recognise what the errors and warnings look like from each, before they chain them together. So piping is not for day 1.

Thanks for any ideas and tips,
Robert

1 Like

I’m for piping and having those methods exposed as functions but I don’t think you should put stan code as a string because you lose all the syntax highlighting. @WardBrian’s vscode plugin (I think this works for the posit ide too!) is super for coding in Stan.

1 Like

Point taken. I’m just hacking about there in the basic mac R.app. I use Sublime Text locally but mostly do serious Stan work in cloud VMs, and I haven’t touched RStudio since about 2019. I do want to take a look at Positron sometime. But I recognise that many learners will value highlighting more than I do.

Yeah I agree that this approach can make it easier for people who aren’t used to this programming style and prefer to focus on learning Stan and Bayes rather than learning new R syntax.

At first glance, I think your functions should work as intended. I don’t think there’s anything deep in cmdstanr that should discourage you from using this approach. If you do discover something that causes unexpected behavior please let us know!

The following is definitely not for beginners, but I was curious about piping and these are the currently supported options

magrittr:

fit <- cmdstan_model(stan_file = stancode) %>%
    {.$sample(data = standata)}

base pipe:

fit <- cmdstan_model(stan_file = stancode) |> 
  (\(x) x$sample(data = standata))()
1 Like

And the complete example (I would not use this myself)

stancode %>% write_stan_file() %>%
             cmdstan_model(compile=TRUE) %>%
             {.$sample(data=standata,
                       seed=123,
                       chains=2,
                       parallel_chains=2,
                       iter_warmup=1000,
                       iter_sampling=2000)} %>%
  assign("stanfit", ., 1) %T>%
  {.$summary()} %>%
  {.$draws()} %>% 
  assign("standraws", ., 1) %>%
  mcmc_pairs() 

Agreed, I wouldn’t use this! But it is interesting, and it’s helpful to know all the ways that people use one language. Some people just love those pipes and are willing to take a bit of a syntactic load in order to keep going left to right.

More commonly, people don’t have to pipe, but find the $method() one extra bit of baggage they have to deal with. Few R users, it seems to me, deal with R6 classes.

There’s also a magrittr pipe %$%, which extracts elements of the object on the LHS. This will take out a function (method, if you like) and evaluate it, but does so only in the environment of the R6 class, which gets us back to this messy business of wrapping the function call in braces.

I did not get this to work. Can you show how you would use %$% here?

I wouldn’t! Nor would I recommend it to anyone. Because the data argument of $sample() does not exist in the R6 / CmdStanModel object, they would need to find a way of getting it in, and that means defining a tweaked R6 class, and there I start to feel a sense of danger that I’m going to cause a problem to other parts of cmdstanr.

I think that, if they must pipe, then it’s best to define variations on functions (in the working environment) that are pipe-friendly and feed everything through. I defined passign (“pipe-assign”, I was thinking) for this, to store objects while keeping a pipe of arbitrary length going. I suspect that a true Wickhamist would know of a Tidier approach than mine.

I’m a general fan of pipe syntax, too, in data manipulation workflows.

It does seem that maybe the intention here is to place a Stan fit within a pipe syntax after the model/data has been proven to work as this seems to shortcut a typical workflow of prior and PPC, etc. Of in the example, passing to mcmc_pairs() it seems that wouldn’t be a final product, so the fit may occur multiple times?

I suppose that even in the pipe, the fit would, as a side effect, throw warnings or messages typical at the end of the fit (max tree depth, divergences, energy) even if more was needed?

1 Like

In case you aren’t aware… R7 is coming… or rather S7 since they renamed it 😅

2 Likes

Exactly, I have to emphasise to pipers that they should proceed slowly…

I believe all the usual console output and warnings appear as expected, although a new question comes to mind about how well they stay in order when running in Jupyter.

1 Like

The %T% pipe evaluates the RHS function but then returns the LHS, allowing a one-step dead-end into graphics calls. The trouble also is that most pipers never use anything but %>% and nowadays |>. And, paradoxically it seems to me, they never conclude with →

There’s nothing “deep” - CmdStanR works against the following fundamental challenges of working against (and I mean against, not with) CmdStan:

  • model compilation is one operation
  • running the resulting compiled executable is another
  • CmdStan is file-based w/r/t both inputs and outputs.

I’m not much of an R user - lately I’ve used ChatGPT and Claude to translate Python to R or write R code - any pipes in my code were put there by a bot.

1 Like

Thanks, that’s reassuring. I tend to think of these as strengths of CmdStan. Translation to C++ gives one set of really helpful errors. Compilation should then not give errors, and in my experience if it does, they are to do with version compatibility in the toolchain, not my model. Running will give other warnings about divergence and what have you, all really useful too. Sending and receiving via utf-8 text files is a superpower to me because so little can go wrong. I like cmdstanr/py just the way they are.

2 Likes