Need an advice for platform selection

Hello everyone,

For my master thesis, I have to re-evaluate one publication. It is a prediction problem. I have quite complex distribution model, MCMC needed and posterior predictive checking to fit the model.

Since as an electrical&electronics engineering graduate and doing master in telecommunication, I do not have an expertise statistics related softwares. After my research, I believe Stan answering my problem requirements. But how convenient to start the project in MATLAB? Or should I start with Python, R, etc… I look forward to your advices! Thanks in advance!

Well, what languages are you familiar with? If you know Matlab really well, then go with Matlab. If you know R, then go with R. If you know Python, go with Python.

Now, if you don’t know any of those languages well, then I would go with Python or R. Both are free, unlike Matlab. In my opinion, Python is somewhat easier to work with than R. (For example it doesn’t have as many annoying quirks, like subtle differences between the [ and [[ operators, or the need to put “else” right after the brace of its corresponding “if”.) However, R seems to be somewhat more entrenched among Stan users, so you’ll see a lot of Stan-related books (like Statistical Rethinking, Doing Bayesian Data Analysis, and Bayesian Data Analysis) that use R for code examples, third-party R packages (like brms) that use Stan, and so on.

Also, the Anaconda Python distribution should make getting started with either RStan or PyStan fairly easy. There are packages in Anaconda for both.

1 Like

I know Matlab 9.5/10. However, never done a data related project before. I usually worked on signal processing related projects in Matlab.

I also took Data Mining course last year and learnt little bit Python with Anaconda. I wouldn’t give myself more than 4/10. I know scikit, numpy, panda. But definitely not at the same level as the way I know Matlab. I know programming in Java, Assembly, Vhdl. So, I can say that I am somehow above average at programming.

Also the project is football betting predictions. The model consists of bunch of normal, truncated normal, poisson, normal(whose variance is half-cauchy). distributions.

And the data is nothing complicated, home team score, away team score, match result(Home,Draw,Away) and several bookmakers’ odds. I mean problem is something like this. What kind of an advice would you give this kind of a problem?

It’s a lot of questions. I hope you have time to answer. Thanks

Then I would recommend starting off with MatlabStan, and if that doesn’t work out, fall back to PyStan. To make it easier to switch between Stan interfaces, put your Stan code in a separate file that you can read into both MatlabStan and PyStan.

I think you’re going to be on your own for that one. It is your master’s thesis after all.

Then I will start with Matlab for sure. Thank you for your kind interest!

I just wanted to make sure that Stan is convenient for the problem definition. Anyhow, thanks again for your interest!

Matlab can be fine if you are fluent making plots with it. R and Python have benefit of having more Stan and MCMC related diagnostic and plotting packages. I used to do everything in Matlab, but moved a few years ago to R.

Thank you for your answer. I am curious, what do you mean by benefiting more from Stan? What kind of advantages compared to Matlab?

Matlab has only MatlabStan interface which is not actively developer. R has rstan, rstanarm, brms, shinystan, bayesplot, loo, projpred, tidybayes (and Python has arviz). Depending on your project it’s possible that the time to learn R (or Python) is more than implementing what you need in Matlab yourself, but if you think you’ll do many projects then it’s probably after first or second project much faster to use great packages made by others.

Oh I see, thank you for your help! I should better start coding in Matlab for now. :)