I’m trying to model adoption of Electric Vehicles and I’m looking for suggestions on a model structure that describes the data generating process accurately. For a specific region, I have New and Used Car Sales overall and sales of Electric Vehicles as a subset of each of those sets of sales. One of the dynamics that is present here is that car purchases don’t happen all that frequently, but that the probability of the next purchase increases the farther in time you get away from the previous purchase. Also, when it comes time to purchase, there is some probability that you choose an EV and that probability increases as a function of time (technology gets better) and adoption (peer pressure network effect). Also, the number of vehicles in a market has increased over time as the overall population grows (maybe changes in future due to ride sharing/car sharing, but don’t need to worry about that for now).
Initially, I have been using compartmental models (SI, SIRS) to describe this process, but I’m not sure that those models capture the purchase lifecycle as described above. Perhaps it does and I don’t understand it correctly.
I think about it as modeling the fleet in it’s entirety and that in each period some portion of the fleet makes a binomial choice and the probabilities that describe that binomial distribution are functions of time and the composition of the fleet in the previous period. ?
Looking for ideas and suggestions for how to think about this problem.
Is you inferential goal primarily about understanding the proportion of vehicles sold that are electric (equivalently, the probability that a given vehicle purchase is electric), or about understanding the total volume of electric vehicle sales?
Ultimately total vehicles sold.
I think the first question I’d ask is whether it might make sense to model the EV sales directly, without worrying about a model for the total vehicle sales. Right now, you need a model of total vehicle sales (all types) and a model the proportion of sales that are EVs. Potentially you could replace this with just a model of total vehicle sales (EV only).
This approach would potentially be a bad idea particularly if
- EV sales are rare, and so data is relatively sparse, AND
- the proportion of sales that are EVs varies more simply than the variation in total sales volume.
In this case, the EV data might be too sparse to estimate a realistic model for the total EV volume directly, but might not be too sparse to estimate a realistic model for the proportion. And then the total sales might be sufficient to estimate a realistic model for the total volume.
However, if the variation in proportion is as complex as the variation in the total, and/or if you have enough EV data to directly estimate a satisfying model for the total volume, then it’s unclear what inferential benefit you are getting by leveraging the total sales of non-EVs as a component of your model.
Wow you’re using a differential equation model for this? That’s very creative.
I don’t know what your background is, but in econ/social science we would approach this as a general panel data problem. I’d probably suggest starting simple with a Gaussian outcome and a multilevel/fixed effects model. For example, fitting a simple linear regression model with varying or fixed intercepts for each region is a good start. You can then relax or add in assumptions (depending on your mindset) about the time process, such as by allowing for linear/quadratic trends and even moving up to something more technical like a Gaussian process with pooling across regions.
Starting with simple models and moving to more complex is really helpful for understanding the data. In general, a relatively simple multilevel/fixed effects model will teach you a lot about the data, and additional complexities can reveal additional nuance at the cost of increased computation.
Thanks. Both of these suggestions were helpful. I ended up finding that the Prophet Logistic Growth (this has a Stan backend) model with a dynamic cap was the most sensible from an explainability perspective. The model mapped nicely to how the fundamental analysts understood the problem. Additionally, the Cap feature was critical because as I researched the EV sales dynamics, one of the biggest factors turned out to be the supply of batteries. So, I can reason about availability of batteries that can act as a supply constraint on my model. I haven’t yet figured out how to model batteries with uncertainty and then have that influence the fitting of the EV sales model, but working on it. Thanks for the help. BC