Practice makes perfect [Beginners How to...]

Hi there, I am relatively new to Bayesian Statistics and decided to use Stan for a current project. The first step is to learn how to model with Stan, for this I intend to translate a working OpenBUGS script (hierarchical model) to Stan and gain comparable results.

Intro
I am an experienced R user but modeling in Stan is quite different. I learned R partly by just trying; writing code and compare the outcome of the code to what I expected the outcome to be.

Problem
When writing a Stan script it seems I can only run a full model and not really observe what my code is doing at intermediate steps.

( Example
I have variable Y in a long format data.frame and I want to use it as a matrix (Y_ij).
R code: matrix(df$Y, nrow = i, ncol = j, byrow=T) # Note in this case: byrow = T (default is F)
Stan: matrix Y[i,j] // does this fill the matrix by row or by column? )

Question
Is there a way to see if the Stan code is actually doing what I think that it is doing? Iā€™d rather learn by trying than endless searching for the mechanics on google / manual / forum.

I understand Stan is (much) like C++, is it a good idea to use some C++ program to see what the (simple) code is doing in Stan? If so; any advice for what to use, maybe something comparable to Rstudio? (I run Ubuntu 16.04)

4 Likes

Basically, no. Because Stan code is translated into C++ and then compiled, there is no opportunity to proceed one line at a time. There are some things you can do. One is to put print statements into your code, such as

print("x is ", x);

in which case it will send that line to the screen. However, it does so every time the log-posterior kernel function is evaluated (if you put the print statement in the transformed parameters or model block), in which case you are going to get thousands of such lines each with a different value of x.

A better option is to define a function in the functions block of a Stan program that does any nontrivial but reasonably atomic operation. You can call the expose_stan_functions function in the rstan package to compile those functions and make them available in your R session. Although you cannot step through them one line at a time, you can call them passing R objects to the function arguments to verify that the outcome is as you expect. You can then use those Stan functions in the subsequent blocks of your Stan program.

2 Likes

When I am having trouble with a model, I usually just put in print statements anywhere I think there could be a problem, compile with -O0 (for compilation speed), and then evaluate the likelihood once using some crude manipulations of the rstan objects:

1 Like

Thanks! So far the print is quite useful to get a feel for the simple structures in RStan even tough it is a bit clumsy. It also gives some feedback/feeling for how (in)complete a model.stan file can be for it to run. I understand that it is the closest we can come for now.

As a side note for linux users who might read;
If you want to change your compilation flag (-O3, -O2 or -O0) start terminal and type:

sudo gedit ~/.R/Makevars

gedit = any text editor software, root is required because the Makevars file is hidden.

Small edit for documentation sake; In Makevars I also changed the ā€˜nativeā€™ setting of cores to ā€˜core2ā€™ (I have 4 with 8 threads in 1 socket).

The Stan parser will accept a program consisting of at least one named program block, so you can write and test your data structures using the admittedly clumsy ā€œdebug by printfā€ method, (which is pretty much the only debug method I use when dealing with the C++ code).

To run a Stan program which doesnā€™t have parameters or a model block in RStan, use the sampling function with argument algorithm="Fixed_param".

Note that the fact that a Stan program can be any set of program blocks allows you to write programs to generate fake data for the model youā€™re developing as well.

2 Likes

Not Stan specific, but you donā€™t need sudo to edit files in the user directory. (Though if you have used sudo, you may have created the file with root permissions, in which case you can delete it and create it as a user, or use chown to put it back under a user control).

Also, if you are using a graphical command, it is better to use gksudo rather than sudo

Ah you are right, donā€™t need sudoā€¦ I am also new to Linux :)

What do you want to see? Thereā€™s a lot going on in running a Stan programā€”loading data, computing transformed data, initialization, adaptation, then sampling. Inside of all of that is the log density and gradient calcs with autodiff and how all the special functions are implemented.

I am not interested in the bayesian-statistics-specifics or what the C++ looks like, rather in the grammar of the language.
Like how array declaration is done. These things are described in the manual but when you are stuck it can be difficult to know where to look. Sometimes it helps to just try a few things to see what is happening, seeing is believing.

Or what is exactly the meaningful difference between:
real variable_name[N];
vector[N] variable_name;
And when do you need to use a dimension indication and when can you skip it?

These small questions can make a difference in your understanding of the works but are not really worth it to ask online, collectively they can be time-consuming when you have to look it up. So that is why some method of observing what the differences are between these things may help.

The second you can do linear algebra operations with and the first you cannot. So, just use vectors rather than one-dimensional real arrays.

You always have to declare the size of the vector.

3 Likes

Excepting user-defined function declarations - the return type and function argument declarations are unsized and only indicate number of array dimensions. See Stan Reference Manual for details.

As always, @bgoodri is correct: all block and local variables must be declared with a size. The generated c++ code validates the variable dimensions in order to prevent segfaults.

You can either try it, as you said you wanted to do, or look the answer up in the manual. The first is described in the chapter on arrays vs. vectors and the second in the language spec section (hint: there are three types of type declarationā€”for block variables, local variables, and function argument variables).