Fitting Multilevel Wave Models

Hi all.

I’m trying to fit multilevel wave models for my research in brms. I’ve beens struggling to get the models to converge on my actual data, so I’ve moved to working with simple simulated examples to try and figure out what might be causing the problem. It seems like the frequency parameter in particular might be causing some issues. I’ve found that even in a simple simulated setting where the only parameter that varies is frequency, the model doesn’t seem to converge. I struggle to get convergence even in situations where I set fairly strong priors on the parameter values that I used to generate the data.

Here’s the sample code:

library("brms")
library("tidyr")
library("tibble")
library("dplyr")

start = Sys.time()

# Generate Wave Params
df_params = tibble(
  ID = seq(1,100),
  amp = 0.2,
  freq = rnorm(100, 1/7, 0.05),
  phase = 0, 
)

# Generate Data
n = 2000
sd = 0.1

set.seed(1)

df_sim_data = expand_grid(ID = seq(1,100),
                          time = seq(1,20)) %>% 
  left_join(df_params,
            by = c("ID")) %>% 
  mutate(epsilon = rnorm(2000,sd=sd),
         affect = amp*cos(2*pi*freq*time + phase) + epsilon)


# Fit Model
df_to_fit = df_sim_data %>% 
  select(ID, time, affect, amp, phase) %>% 
  mutate(pi = pi)


model_spec = brmsformula(affect ~ amp*cos(2*pi*freq*time + phase),
                         freq ~ 1 + (1|ID),
                         nl = TRUE)

model = brm(model_spec,
            df_to_fit,
            prior = c(prior(normal(0.142857, 0.01), nlpar="freq"),
                      prior(normal(0.05, 0.01), nlpar="freq", group="ID", class="sd"),
                      prior(normal(0.1, 0.01), class="sigma")),
            iter = 20000,
            cores = 4,
            file = "regression/freq_fit")

print(Sys.time() - start)

I’ve tried running this up to 80000 iterations, and I’ve also tried a few different priors. The ones included here are the strongest ones i’ve given, but I’ve also tried a uniform prior on the interval [0.05, 0.5] for the frequency intercept, thinking that’s the range of unique frequencies that are possible

Here’s the model summary output from the run above:
Screen Shot 2023-10-25 at 4.36.02 PM

And here’s the trace plot:

I’d love guidance what might be the problem and how to address it. Happy to provide more information too if it would be helpful!
Thanks for your help.
Ari

Hi! Fitting sinoidal functions to data can be quite a headache. Although I don’t fully grasp the model you are describing (the forum interface has interpreted some of the code in your post as formatting instructions, garbling the shown code), your trace plots show what’s going on. With the priors that you defined, the posterior is multi-modal. This probably has to do with the priors for the periods or frequencies of waves being too wide. Or if there are multiple stacked wave signals in your data (can’t quite tell from your example), you may get label switching of the sets of parameter that belong to each wave.

80,000 samples is usually overkill. I suggest that if a relatively simple model based on simulated data doesn’t converge within a few thousand iterations, something is wrong.

2 Likes

In my experience, periodic models have massive multimodality that is inherent to the likelihood (n.b. Not the prior @LucC). Lots of ideas here but nothing I found generally robust.

1 Like

I fully agree @mike-lawrence! I meant to say that without very restrictive priors on parameters that determine phase-translations and frequencies, multi-modality is massive.

1 Like

Given that the multimodality relates to the location of the true frequency, I think the priors need not merely be restrictive but truly highly informed by theory/prior data. Possibly this is what you meant though.

1 Like

Thank you both for your timely responses! @mike-lawrence I will check out those suggestions you link to.

I’ve also edited the post now so that the code should be properly formated. Sorry about that, I’m new to these forums.

I was wondering about the multi-modality issue. As is hopefully now clear in the code from the section near the top commented “Generate Wave Params”, the generative distribution for the frequencies of these waves is uni-modal. And the prior that I set is tight around that mode.

However I recognize that this is still potentially a problem because the cycles of the periodic function mean that the actual generative mode + 2*pi will give the same result.

But the trace plots don’t look like sampler is finding the cyclic modes. It kind of just looks like it found a couple places that hover around the true value(1/7) at least for the frequency intercept.

Is the multi-modality issue related to the cycles of wave function? Or are there other modes that you are referring to when you speak of multi-modality? If so what gives rise to those?

Also if it is helpful here is an image of a sample of the simulated data:

Thanks again for your help! Really appreciate your insights.

yes, the data is generated from a single frequency, but when you then compute the likelihood across a range of candidate true frequencies, that likelihood function is multimodal.

And it’s not simply peaks at harmonics, it’s a much more complicated multimodality affected by the sample-rate & duration as well as the true frequency. I’ve been intermittently wrestling with this for years now and still have yet to find any clear & robust solutions.

Note that just recently I received the advice from @avehtari to try SMC sampling for periodic models, and so far using the SMC sampler in PyMC, chains seem to indeed mix better.

I add the explanation why SMC. If the period is unknown, there are multiple modes in the posterior for the period length or frequency. Many of the modes are such that they are no close to true frequency, but still some of the periods fit the data. If there are many observations, these modes can be difficult to escape. SMC adds (usually) data sequentially, and thus the early posteriors are easier to explore and a better frequency is found without problems. Alternatively tempering approaches could be used like the adaptive path sampling which is quite easy to implement with Stan.

1 Like

Thanks so much for these suggestions @mike-lawrence and @avehtari! I will give them a try.

Oh, neat. I’d actually found that for very simple models, running SMC but giving it all the data at once still yielded better rhats than HMC, but then started getting bad rhats as I bumped up the model complexity. I’ll now see if feeding in data sequentially helps as you suggest.

I guess I don’t know what SMC PyMC is using, Usually S in SMC refers to sequential which could mean adding data sequentially, but it could mean some other sequentially changing approximation, too. Anyway, SMC starts with something wide, and sequentially approaches the true target.

Was curious what the issue is here. The example you provided appears to be sampled too slowly to identify the model well. e.g. this fit looks ok:

With quicker sampling (from the generative model) it works fine when specified in terms of starting points and derivatives, ie as an ODE / SDE / continuous-time state space model, and I guess it also should with the brms approach you have.

# library("brms")
library("tidyr")
library("tibble")
library("dplyr")

start = Sys.time()

# Generate Wave Params
df_params = tibble(
  ID = seq(1,100),
  amp = 0.2,
  freq = rnorm(100, 1/7, 0.05),
  phase = 0
)

# Generate Data
n = 20100
sd = 0.1

set.seed(1)

df_sim_data = expand_grid(ID = seq(1,100),
  time = seq(0,20,.1)) %>% 
  left_join(df_params,
    by = c("ID")) %>% 
  mutate(epsilon = rnorm(20100,sd=sd),
    affect = amp*cos(2*pi*freq*time + phase) + epsilon)



# ctsem model spec --------------------------------------------------------

library(ctsem)

#specify model
m <- ctModel(
  latentNames = c('affect','daffect'),#level and derivative processes
  manifestNames = 'affect', #observations
  type='stanct', #continuous time state space model formulated via stan
  id = 'ID',time = 'time',
  LAMBDA=matrix(c(1,0),nrow=1,ncol=2),#relation between processes and observations
  DRIFT=matrix(c(0,'freqpar|-log1p(exp(param*10))|TRUE',1,0),2,2), #process dynamics
  DIFFUSION=matrix(c(0,0,0,'diffusionSD'),2,2), #process noise
  MANIFESTVAR='residualSD', #observation noise
  MANIFESTMEANS='intercept') #intercept term / observation offset

ctModelLatex(m) #view model as pdf
  
f <- ctStanFit(datalong = df_sim_data,ctstanmodel = m,cores=8,
  # priors=T, optimize=F,inits='optimize',intoverpop=T,chains=6, #for HMC enable this line
  plot=T)

s=summary(f)

#population frequency estimate: frequency related to 2nd derivative via sqrt(-x)/(2*pi)
sqrt(-s$popmeans[grepl('freqpar',rownames(s$popmeans)), 
  c('mean','2.5%','50%','97.5%')])/(2*pi)

#subject specific frequencies:
subjpars <- data.frame(ctStanSubjectPars(f,cores=1)[1,,])
plot(df_params$freq,sqrt(-subjpars$freqpar)/(2*pi),ylab='Estimated',xlab='True',main='Subject Frequencies')
abline(0,1)


ctKalman(f,removeObs = T,plot=T,subjects = 1:4) #expectations before seeing data
ctKalman(f,plot=T,subjects = 1:4) #forward predictions
ctStanDiscretePars(f,plot=T)

Thanks for this Charles! This is interesting I will give a try.

One constraint we are dealing with here is that the sampling rate I have specified in the simulated example is the sampling rate from our actual data. For sure higher sampling rate would be ideal, but the data is collected at this point and if possible we’d like to fit these models with our current constraints. If there are methods that get us around these sampling difficulties that would be helpful for us.

Sure, you know the sampling rate but you don’t know the frequency etc right? They interact to determine how easily you’ll fit your oscillating model. The main point of what I posted is just to show that it doesn’t look well identified with the simulated data, so it’s not surprising you’ve a lot of trouble fitting.