For each posterior iteration i, draw a zero with probability \lambda_{t_{i}}. Otherwise draw a sample from baseline(\theta_{t_i}).

Note that this is quite different from the expression in your post, which will never yield a zero (it simplifies to (1-\lambda)baseline(\theta)).

*If there are no covariates, \lambda and \theta will be parameters in the model, and then the posterior for \lambda_t and \theta_t will just be the posteriors for those parameters.

]]>I think if you look at the stan files in lmmelsm, you may be able to just tweak it to allow random slope variance modeling as well.

]]>
rlange:

Would anybody be willing to talk through some of these Qs in a short call?

Yup. Dm me.

]]>`stan::services`

helped get me out of the rabbit hole I was in before, but into another.
assuming you are able to build off of the existing model class used by the other Stan algorithms.

There appears to be pretty tight integration between HMC and the Model parameters, so getting HMC to run in a different space will require some hacking. The most natural place to bootstrap in our algorithm to the existing code, then, seems to be subclassing `stan::model::model_base`

, tricking HMC into treating the variational parameters as if they were the model parameters, and providing a custom set of `log_prob`

functions… unless this would break things like autodiff? There are a *lot* of functions in `model_base`

that need overriding, and some of the comments are pretty cryptic to me (e.g. would my mock parameter space be ‘constrained’ or ‘unconstrained’?).

Would anybody be willing to talk through some of these Qs in a short call?

]]>Code:

```
simulate_mrp_data = function(n) {
J <- c(2, 3, 7, 3, 50) # male or not, eth, age, income level, state
poststrat <- as.data.frame(array(NA, c(prod(J), length(J)+1))) # Columns of post-strat matrix, plus one for size
colnames(poststrat) <- c("male", "eth", "age","income", "state",'N')
count <- 0
for (i1 in 1:J[1]){
for (i2 in 1:J[2]){
for (i3 in 1:J[3]){
for (i4 in 1:J[4]){
for (i5 in 1:J[5]){
count <- count + 1
# Fill them in so we know what category we are referring to
poststrat[count, 1:5] <- c(i1-1, i2, i3,i4,i5)
}
}
}
}
}
# Proportion in each sample in the population
p_male <- c(0.52, 0.48)
p_eth <- c(0.5, 0.2, 0.3)
p_age <- c(0.2,.1,0.2,0.2, 0.10, 0.1, 0.1)
p_income<-c(.50,.35,.15)
p_state_tmp<-runif(50,10,20)
p_state<-p_state_tmp/sum(p_state_tmp)
poststrat$N<-0
for (j in 1:prod(J)){
poststrat$N[j] <- round(250e6 * p_male[poststrat[j,1]+1] * p_eth[poststrat[j,2]] *
p_age[poststrat[j,3]]*p_income[poststrat[j,4]]*p_state[poststrat[j,5]]) #Adjust the N to be the number observed in each category in each group
}
# Now let's adjust for the probability of response
p_response_baseline <- 0.01
p_response_male <- c(2, 0.8) / 2.8
p_response_eth <- c(1, 1.2, 2.5) / 4.7
p_response_age <- c(1, 0.4, 1, 1.5, 3, 5, 7) / 18.9
p_response_inc <- c(1, 0.9, 0.8) / 2.7
p_response_state <- rbeta(50, 1, 1)
p_response_state <- p_response_state / sum(p_response_state)
p_response <- rep(NA, prod(J))
for (j in 1:prod(J)) {
p_response[j] <-
p_response_baseline * p_response_male[poststrat[j, 1] + 1] *
p_response_eth[poststrat[j, 2]] * p_response_age[poststrat[j, 3]] *
p_response_inc[poststrat[j, 4]] * p_response_state[poststrat[j, 5]]
}
people <- sample(prod(J), n, replace = TRUE, prob = poststrat$N * p_response)
## For respondent i, people[i] is that person's poststrat cell,
## some number between 1 and 32
n_cell <- rep(NA, prod(J))
for (j in 1:prod(J)) {
n_cell[j] <- sum(people == j)
}
coef_male <- c(0,-0.3)
coef_eth <- c(0, 0.6, 0.9)
coef_age <- c(0,-0.2,-0.3, 0.4, 0.5, 0.7, 0.8, 0.9)
coef_income <- c(0,-0.2, 0.6)
coef_state <- c(0, round(rnorm(49, 0, 1), 1))
coef_age_male <- t(cbind(c(0, .1, .23, .3, .43, .5, .6),
c(0, -.1, -.23, -.5, -.43, -.5, -.6)))
true_popn <- data.frame(poststrat[, 1:5], cat_pref = rep(NA, prod(J)))
for (j in 1:prod(J)) {
true_popn$cat_pref[j] <- plogis(
coef_male[poststrat[j, 1] + 1] +
coef_eth[poststrat[j, 2]] + coef_age[poststrat[j, 3]] +
coef_income[poststrat[j, 4]] + coef_state[poststrat[j, 5]] +
coef_age_male[poststrat[j, 1] + 1, poststrat[j, 3]]
)
}
#male or not, eth, age, income level, state, city
y <- rbinom(n, 1, true_popn$cat_pref[people])
male <- poststrat[people, 1]
eth <- poststrat[people, 2]
age <- poststrat[people, 3]
income <- poststrat[people, 4]
state <- poststrat[people, 5]
sample <- data.frame(cat_pref = y,
male, age, eth, income, state,
id = 1:length(people))
#Make all numeric:
for (i in 1:ncol(poststrat)) {
poststrat[, i] <- as.numeric(poststrat[, i])
}
for (i in 1:ncol(true_popn)) {
true_popn[, i] <- as.numeric(true_popn[, i])
}
for (i in 1:ncol(sample)) {
sample[, i] <- as.numeric(sample[, i])
}
list(
sample = sample,
poststrat = poststrat,
true_popn = true_popn
)
}
set.seed(1)
library(rstanarm)
library(ggplot2)
library(bayesplot)
theme_set(bayesplot::theme_default())
options(mc.cores = 4)
library(dplyr)
library(tidyr)
mrp_sim <- simulate_mrp_data(n=1200)
str(mrp_sim)
sample <- mrp_sim[["sample"]]
rbind(head(sample), tail(sample))
poststrat <- mrp_sim[["poststrat"]]
rbind(head(poststrat), tail(poststrat))
true_popn <- mrp_sim[["true_popn"]]
rbind(head(true_popn), tail(true_popn))
fit <- stan_glmer(
cat_pref ~ factor(male) + factor(male) * factor(age) +
(1 | state) + (1 | age) + (1 | eth) + (1 | income),
family = binomial(link = "logit"),
data = sample,)
fit
```

2.21.1

The version of R you are running (e.g., from `getRversion()`

)

platform x86_64-w64-mingw32

arch x86_64

os mingw32

system x86_64, mingw32

status

major 4

minor 1.2

year 2021

month 11

day 01

svn rev 81115

language R

version.string R version 4.1.2 (2021-11-01)

nickname Bird Hippie

Your operating system (e.g., OS X 10.11.3)

Windows 10 Pro

You could add a new algorithm in `stan::services`

(on a branch) and link that in an interface and you’ll start working.

For testing, search for the work on PosteriorDB on these forums. It’ll get you to an effort to have a library of models and inferences to compare results against.

]]>I believe ““all”” you need to do is implement the algorithm in `stan::services`

, assuming you are able to build off of the existing model class used by the other Stan algorithms. To actually call it anywhere, you’ll need a version of CmdStan which calls that service. This recent blog series by @jtimonen may be helpful understanding more of that: Understanding the Stan codebase - Part 1: Finding an entry point | Juho Timonen

If you’re just interested in implementing your algorithm for more testing, you can fork those repositories and get started. If you’d like to add the algorithm to Stan in a more “official” or permanent way, it is good to start by submitting a design doc (think of these like RFCs)

]]>My collaborators and I have developed and written up some theory for a new family of approximate-inference algorithms. All of our testing so far has been in Python on toy problems, and now we’re looking for the best way to benchmark it on more realistic problems and (hopefully) share it with the community. Adding our algorithm to STAN seems like a great way to do both, but after a few days of looking at the STAN source, I have to admit that I’m pretty lost. Part of the problem is that none of us has recent or extensive C++ experience, and an initial test suggests that the latency PySTAN make a python wrapper impracticable.

I see this thread that some big refactoring of the core inference code might be on its way. I’m looking for some guidance from those of you more in-the-know: how hard would it be to add our algorithm (by someone with more STAN experience)? Should we wait until after the refactor, and what is it’s timeline?

Some important details: our proposed algorithm is essentially a combination of MCMC and ADVI (and thus doesn’t neatly fit inside any of the existing class hierarchies, as far as I can tell). We apply mostly out-of-the-box HMC or NUTS, but we apply them *to the variational parameters* and treat the resulting samples as a nonparametric mixture of variational components. We have a preprint here describing the idea, if you’re curious. For now, we just want to start a conversation, picking your collective brains about how best to proceed.

Looking forward to any insights you all are able to share! I would also be happy to chat privately with anyone who is interested and able to help out.

]]>We seek exceptional postdoctoral candidates to be hosted at Faculty of Dental Medicine and Oral Health Science, McGill University, Montreal, QC. These positions have an initial term of one year with the possibility of extension. The starting date is flexible, but no later than Fall 2022.

**Project description:**

Uncertainty in machine learning based prediction algorithms is a growing concern, especially in health science applications. This project will explore the possibility of using Bayesian deep learning models to quantify uncertainty levels and compare them with human uncertainty levels in predicting oral health related outcomes. Uncertainty quantification in deep learning is rapidly growing sub-field and this position will offer the opportunity to work with our diverse team of researchers, including our international collaborators for applying latest developments in the field. This project offers great potential for you to publish, both methodological papers within the statistics/machine learning community as well as applied papers in health sciences.

Dentistry further offers a wide array of opportunities of applied machine learning projects with decades of robust digital data availability.

**Minimum Qualifications**

- PhD in Statistics, mathematics, data science or related field, with a focus in deep learning, completed within the past five years or will have completed all PhD requirements by commencement of appointment.
- Knowledge and experience in Bayesian methods and computing, ideally in health science applications (variational inference, MCMC methods, etc.)
- Familiarity with image processing and classification using deep learning or active learning techniques
- Demonstrated ability to work collaboratively in an applied health research team

**Required Skills:**

- Statistical programming in R or Python
- Probabilistic programming in Stan or TensorFlow probability or Pytorch (Pyro)
- A track record of relevant publications at top machine learning or computer vision conferences (NIPS, ICML, UAI, JMLR, CVPR, ICCV, PAMI, IJCV, IEEE IT) and/or top-ranked image processing or health research journals is essential.

**Desired skills:**

- Uncertainty quantification in deep learning for computer vision in health research
- Familiarity working with HPC environments, ideally Compute Canada resources
- A firm grasp of approximate Bayesian machine learning and/or advanced (medical) image processing is a plus.

**Name of immediate supervisor:** Sreenath Madathil, Assistant Professor

**Work schedule:** Full time, Monday to Friday,

**Working hours:** 9.00 am to 5.00 pm, (35hrs/ week)

**Duration:** 1-year initial appointment with opportunity to renew.

**Location:** 2001 McGill College Ave, Montreal, QC, H3A 1G1

**Salary:** $34, 611 to $45,000

**Planned start date:** As soon as possible

**Application package must include the following:**

- Curriculum vitae (including publications)
- Cover letter stating the motivation, interests, and qualifications for the position.
- Names and contact information of 3 references.

**Use the following links for application:**

For McGill internal candidates: https://wd3.myworkday.com/mcgill/d/inst/15$392530/9925$32925.htmld

For external candidates: Workday

]]>```
fit$summary(NULL, ~quantile(.x, probs = c(0.4, 0.6)))
```

`fit$summary()`

is just a wrapper to

```
posterior::summarize_draws(fit$draws(), posterior::default_summary_measures())
posterior::summarise_draws(fit$draws(), posterior::default_summary_measures())
```

See docs for the summarize_drawshttps://mc-stan.org/posterior/reference/draws_summary.html.

]]>Here is some example data I generated. The x_{min} parameter for each group is fixed at 1 and the \alpha for each group is drawn from a distribution centered around \alpha_{mean} = 2.

```
library(Pareto)
library(ggplot2)
library(dplyr)
set.seed(222)
alpha_mean <- 2
alpha_group <- rnorm(3, mean = alpha_mean)
dat <- data.frame(group = 1:3, x_min = 1, alpha = alpha_group) %>%
group_by(group, x_min, alpha) %>%
summarize(x = rPareto(10000, t = x_min, alpha = alpha))
# What does it look like?
ggplot(dat, aes(x = x, group = factor(group), fill = factor(group))) +
geom_histogram(position = 'dodge') +
scale_x_log10() + scale_y_log10(expand = c(0,0))
```

Histograms overlaid by group:

Below is the Stan code I have so far, which does not account for group-level variation in \alpha. Any help would be greatly appreciated.

```
data {
int<lower=0> N;
vector<lower=0>[N] x;
real<lower=0> x_min;
}
parameters {
// Pareto density
real<lower=0, upper=5> alpha;
}
model {
// Prior: Pareto density
alpha ~ lognormal(1, 1) T[0, 5];
// Likelihood: Pareto density
x ~ pareto(x_min, alpha);
}
generated quantities {
vector[N] log_lik; // Log-likelihood for getting info criteria later
for (i in 1:N) {
log_lik[i] = pareto_lpdf(x[i] | x_min, alpha);
}
}
```

]]>Thank you in advance

]]>
Franzi:

Which program would you recommend to work with instead of Visual Studio?

Visual Studio *Code* works well with WSL2, just be sure to install the extension “Remote - WSL”.

So I thought that the best match for this problem is fitting a survival model with time-varying covariates and a random/group-level-effect term that accounts for repeated subjects in the data. I did not really find any matching frequentist implementations that I was able to understand, so I reverted to Bayesian models for now.

I first tried out `rstanarm::survival`

to fit a survival model like this:

```
fit.rstanarm <- stan_surv(formula = Surv(tStart, tStop, event) ~ a + b + (1|subject), ...)
```

Then I came across an implementation of a discrete-time survival model via Bayesian logistic regression by @Solomon using `brms`

: 12 Extending the Discrete-Time Hazard Model | Applied longitudinal data analysis in brms and the tidyverse

I tried that model out like this:

```
fit.brms <- brm(formula = event | trials(1) ~ 0 + Intercept + a + b + (1|subject), ...)
```

What I care most about in the end is being able to predict the survival probability, or actually the probability of a subject to have the event, i.e., 1 - survival probability, given the covariate values at that time.

The two implementations actually yield similar distributions for the parameters `a`

and `b`

. They also predict similar survival probabilities, see below a plot of an example experiment where I plotted `a`

on the x-axis:

The survival probabilities look very similar, however, the hazard rates seem to be on different scales.

My question(s):

Are these type of models related, and if so, how? Are they just different by their assumptions on the baseline hazard? Is any of them favorable for my problem setting? Does it make sense to fit both of them and then evaluate the better one, for instance, in terms of time-dependent AUC and Brier score?

- Operating System: Ubuntu 20.04
- rstanarm Version: 2.21.2
- brms Version: 2.16.3

maedoc:

I use WSL2

Thanks for your suggestion. I’m still struggling with Windows :sweat_smile: Now, I’ll try the way with WSL2.

Which program would you recommend to work with instead of Visual Studio?

]]>Yes, I read that post with interest, because it seemed to have the answer. But from what I could tell, it talks about modelling the variance of the random intercept only. I don’t think (but would love to be wrong) it extends this to also model random slope variance.

I’ll have another read now.

]]>`cv_varsel(model, ndraws=10, ndraws_pred=10)`

or other equally small numbers? ]]>The problem that I am dealing with is iterative (incremental) updating, or online learning of multinomial HMM (i.e., states and outcomes are all discrete) for each new observation. Specifically, I want to **keep track of the changes in the model’s estimates of state transition probabilities and emission probabilities for every new observation, when the true parameters are fixed**.

There are several online estimation algorithms using EM approach, such as Online Learning with Hidden Markov Models, but I believe none exists in Bayesian paradigm.

One possible solution that I am thinking of is **fitting HMM iteratively for every new observation**.

For example, if I have 100 observations from a multinomial HMM sequence with fixed parameters, I may fit the model 100 times using first y_1, then [y_1, y_2], then, [y_1, y_2, y_3], …, and finally [y_1, ..., y_{100}]. Then, it would be possible to achieve estimated state transition probability at time t , A^t, and estimated emission probability at time t, B^t.

What are your thoughts about this?

**Do you think it is doable, or should other approaches be adopted?**

p.s. I also have a question related to non-identifiability of HMM in this setting, and it was covered in my other question: Manual ordering of two simplexes for emission probabilities in multinomial HMM.

Thank you in advance,

Minho

I am trying to estimate multinomial hidden markov model (i.e., states and observations are all discrete) using this code by Luis Damiano.

The HMM problem that I am dealing with has 2 hidden states and 4 possible outcomes (observations).

When fitting the model using Stan, the posterior is multimodal as expected, likely because of the possible rotation.

One solution that I am thinking about is **setting a constraint that the emission probability of observation 1 for State 1 is always higher than State 2.**

In mathematical form, if we define B_{ij} as probability of observing j, j = 1,...4 for state i, i = 1,2, set a constraint that B_{11} > B_{21}.

However, I cannot use simple solutions like `ordered`

, since B is implemented as rows of simplexes like below:

```
parameters {
// ...
// Discrete observation model
simplex[L] phi_k[K]; // event probabilities
}
```

Instead, **my alternative plan is manually exchanging two simplexes phi_k[1] and phi_k[2] when phi_k[1,1] < phi_k[2,1]**.

One problem with this approach is that I cannot directly access to `phi_k`

. Alternatively, I created `ordered_phi_k`

in `transformed parameters`

and used it in the place of `phi_k`

.

So, the original code was

```
transformed parameters {
vector[K] unalpha_tk[T];
{ // Forward algorithm log p(z_t = j | x_{1:t})
real accumulator[K];
for (j in 1:K) {
unalpha_tk[1][j] = log(p_1k[j]) + log(phi_k[j, x[1]]);
}
for (t in 2:T) {
for (j in 1:K) { // j = current (t)
for (i in 1:K) { // i = previous (t-1)
// Murphy (2012) Eq. 17.48
// belief state + transition prob + local evidence at t
accumulator[i] = unalpha_tk[t-1, i] + log(A_ij[i, j]) + log(phi_k[j, x[t]]);
}
unalpha_tk[t, j] = log_sum_exp(accumulator);
}
}
} // Forward
}
```

and I have modified it into

```
transformed parameters {
vector[K] unalpha_tk[T];
vector[L] ordered_phi_k[K];
{ // Forward algorithm log p(z_t = j | x_{1:t})
real accumulator[K];
// use ordered phi
if (phi_k[2,1] > phi_k[1,1]) {
ordered_phi_k[1] = phi_k[2];
ordered_phi_k[2] = phi_k[1];
} else {
ordered_phi_k[1] = phi_k[1];
ordered_phi_k[2] = phi_k[2];
}
for (j in 1:K) {
unalpha_tk[1][j] = log(p_1k[j]) + log(ordered_phi_k[j, x[1]]);
}
for (t in 2:T) {
for (j in 1:K) { // j = current (t)
for (i in 1:K) { // i = previous (t-1)
// Murphy (2012) Eq. 17.48
// belief state + transition prob + local evidence at t
accumulator[i] = unalpha_tk[t-1, i] + log(A_ij[i, j]) + log(ordered_phi_k[j, x[t]]);
}
unalpha_tk[t, j] = log_sum_exp(accumulator);
}
}
} // Forward
}
```

What was the result?

I have simulated a HMM sequence with 1000 observations, where A = [0.75, 0.25; 0.35, 0.65], and B = [0.5, 0.3, 0.1, 0.1; 0.1, 0.15, 0.4, 0.35].

The model reasonably recovers the true parameters:

Altogether, my question is,

**is this approach justifiable in this specific constraint?**

Thank you in advance,

Minho

a quadratic model: i.e.

y ~ x + I(x^2)

and a cubic model: i.e.

y ~ x + I(x^2) + I(x^3)

So far I’ve been scaling x to have a mean of 0 and a SD of 0.5 and using a Student prior - `student_t(7, 0, 1)`

. The aim is to have some form of regularisation and to calculate BFs. However, the problem is that the beta values for the quadratic/cubic is naturally more extreme than the linear one. Should I therefore scale the quadratic term seperately? Or should I just use different priors? If the latter, what would be a `student_t(7, 0, 1)`

equivalent for quadratic and cubic betas? Does it even make sense to have variants of x in model that have different scales?

If any one has an accessible text how to construct priors with this in mind, please let me know because currently I’ve hit a wall.

]]>projpred started without error message.

However, no projpred result can be obtained after various attempts.

My glmm model has about 15 covariates with 30 predictors. The sample size is 72610, but the number of events is 226. I used a t prior.

All the model estimation properties are ok.

Running cv_varsel(model) crashes R on the linux server (134 GB RAM).

I assume that this is memory management problem.

I was able to obtain a reference model using get_refmodel. The object size is 13.8 GB.

Running cv_varsel on the reference model also crashed the R session.

I must assume that my data set with a sparse outcome is not suitable for projective prediction.

Perhaps this is an example of don’t try models unless there are 10 or more events for each predictor.

]]>
wds15:

And hence it does not matter what value I write into the variable.

Once the variable is defined we are going to turn on the feature.The same is true for`STAN_THREADS`

,`STAN_MPI`

and maybe others.

(emphasis mine)

I hit this with `STAN_THREADS`

. the way makefile variables work / our makefiles are written, this is probably true everywhere.

the CmdStanPy and CmdStanR users can set the makefile variables from the interfaces, so you should expect that they would use any of these. cf: CmdStanPy Workflow — CmdStanPy 1.0.0 documentation

]]>to parallelize running the NUTS-HMC sampler across chains, the Stan model must be compiled with C++ compiler flag

STAN_THREADS. While any value can be used, we recommend the value`True`

,

`FALSE`

, `False`

, and `F`

since R and Python users without c++ background occasionally find their way into the makefiles. Especially since these same users might have been using `TRUE`

, `True`

or `T`

to turn things on, and seen behavior that “confirmed their expectations”.
But I totally understand if you prefer to assume that anybody tinkering in makefiles except with copy-paste should have some minimal clue of what they’re doing.

]]>