Hi all,
I’m planning to put a Stan-based model into production (run on a schedule, serve some transformation of outputs to a website) on AWS, as close to “serverless” as possible. I was wondering what the community thinks is the best way to do this.
I’ve been customizing an Amazon Linux 2 AMI to run cmdstanpy. The production process is going to look like:
- Start EC2 instance
- trigger cmdstanpy code
- cron job on instance copies stan output to S3 using AWS cli’s
s3 sync
- Instance shuts down on successful completion of (2), or on timeout
- downstream processes reference stan output on S3
I’ll be running the same code against multiple independent datasets (these are IRT models on a variety of courses), so I’m hoping to scale this up by running the same image in multiple EC2 instances.
Things I’ve learned so far:
- httpstan is fiddly, and the zipped output it creates in .cache needs custom processing (it’s not valid JSON)
- pystan is tightly integrated with a local httpstan server, and would need adjustment to run with a remote. You can’t just forward ports to localhost and be on your way.
- “Amazon Linux 2” on ARM means compiling your own recent Python
- I need a fairly beefy instance type to get performance comparable to my local (M1 Pro) machine. I’ve been using an a1.xlarge with ARM processors for testing and it’s not cutting it.
Things I believe to be true:
- AMIs can’t be converted to docker images very easily, but the Amazon Linux AMI specifically has a pathway
Questions:
- Is this, roughly, the pattern others have been using? Did you start with docker images instead?
- Sagemaker and associated bells and whistles (Model Registry, etc.) looks inappropriate for an entirely batch processing workflow needing a custom image. Has anybody tried using Stan within Sagemaker recently?
- What instance types and architectures are the best for running Stan code?