I’ve been running my models on my Institute’s HPC (with PBS system for job queueing). Essentially submitting a PBS file that runs an R script which calls the model with cmdstan_model(), runs $sample, and saves the fit. However, because some of them were poorly specified and taking too long to run, I killed the jobs using qdel.
I was later notified that some processes associated with my jobs were still on the HPC nodes, using the CPU, despite the jobs being killed.
Can anyone give advice as to exactly what is happening here? i.e. what processes might be remaining, what is an easy way to make sure all of the processes associated with a job are killed if the job fails or is forced to stop? Is there something I can put in my R script or PBS script to stop this happening if I have to force kill jobs again?
Any help would be greatly appreciated.