I have data that looks like this:
d %>% select(mode, Density, bus_stops_per_1000, has_tram)
# A tibble: 101 x 4
mode[,"walking"] [,"cycling"] [,"pt"] [,"car"] [,"other"] Density bus_stops_per_1000 has_tram
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <lgl>
1 0.24 0.00195 0.17 0.588 0.000312 5694. 1.52 FALSE
2 0.19 0.08 0.12 0.55 0.06 2539. 4.89 FALSE
3 0.03 0.24 0.08 0.649 0.00108 2823. 2.02 FALSE
4 0.07 0.03 0.23 0.670 0.000385 673. 15.8 TRUE
5 0.09 0.02 0.2 0.687 0.00277 2943. 3.36 TRUE
6 0.12 0.00285 0.11 0.764 0.00353 7433. 1.84 FALSE
7 0.18 0.000414 0.13 0.686 0.00310 2589. 3.30 TRUE
8 0.03 0.34 0.24 0.387 0.00346 2844. 2.71 TRUE
9 0.08 0.04 0.21 0.669 0.00101 2189. 5.18 FALSE
10 0.13 0.01 0.14 0.719 0.000635 2677. 1.33 FALSE
# … with 91 more rows
The response is compositional with 5 columns which sum to 1 per row:
summary(rowSums(d$mode))
Min. 1st Qu. Median Mean 3rd Qu. Max.
1 1 1 1 1 1
I would like to estimate how the compositions change in response to some predictors. I have demonstrated a proof of concept using brms
shown here (graphical output shown below).
In recent versions of brms
the following model does not finish, even after 2+ hours, despite the small size of the input dataset that represents 101 cities around the world, with the following command:
m = brm(mode ~ Density + bus_stops_per_1000 + has_tram, data = d, family = dirichlet(), inits = 0, iter = 10e4)
I’m quite new to this brave new (to me at least) world of Bayesian inference so it’s likely that I’m making some conceptual errors. However, based on reading the brms
documentation and my evolving understanding of modelling, I think the above command should at least return a value.
Any ideas what I’m doing wrong and how I can make this work very much appreciated. (A related question is that if we predict mode shares for a particular mode using this method, is it possible to calculate the confidence intervals of change for a particular mode, e.g. if a city gained a tram system the central estimate of mode share by walking increases by 0.008 but what is the distribution? See here for an explanation)
You should be able to reproduce the issue and generate results for the 101 cities using the reproducible example below:
library(brms)
library(dplyr)
library(sf)
d = readRDS(url("https://github.com/ATFutures/who3/blob/master/global-data/cities-101-osm-bus.Rds?raw=true"))
names(d)
# no zeros allowed: how to allow zero values?
d %>% sf::st_drop_geometry() %>% select(walking:cycling) %>% summary()
d = d %>%
st_drop_geometry() %>%
# filter(cycling != 0) %>%
# filter(other != 0) %>%
mutate_at(vars(other, cycling), ~case_when(. == 0 ~ runif(n = 1, min = 0, max = 0.5), TRUE ~ .)) %>%
ungroup() %>%
select(-bb_poly) %>%
mutate(Density = Population / Area)
totals = d %>% select(walking:other, -City) %>% rowSums() - 100
d$car = d$car - totals
modes_matrix = d %>% select(walking:other) %>% as.matrix()
summary(rowSums(modes_matrix))
d = d %>% select(-(walking:other))
modes_matrix[modes_matrix == 0] = 0.01
d$mode = modes_matrix / 100
names(d)
head(d)
m = brm(mode ~ Density + bus_stops_per_1000 + has_tram, data = d, family = dirichlet(), inits = 0, iter = 10e4)
- Operating System: Ubuntu 18.04
- brms Version:
packageVersion("brms")
#> [1] '2.13.3'
Created on 2020-07-01 by the reprex package (v0.3.0)