Workflow Orchestration

Purpose

Here, we describe how to use various workflow tools. These are particularly valuable when orchestrating large numbers of calculations, especially when the workflow is composed of multiple jobs and may even be dynamic. There are  hundreds of workflow tools , all of which have their own benefits and problems. We will describe only a few here that are most relevant to our group.
Are you running expensive DFT calculations or other simulations on Slurm? Or how about multi-step, complex simulation workflows that you would like to have orchestrated in the background? If so, we currently recommend using  👷Jobflow , which is a workflow orchestration tool developed as part of the Materials Project software stack. Jobflow is also the required workflow orchestration tool of atomate2, one of the workflow libraries we use.
Are you looking to run a large number of Slurm jobs that would otherwise overload the queue? If so, try the  batch submission  option with jobflow-remote. Alternatively, you can use  🌀Parsl , which provides native support for the "pilot job" model where you can request a large number of nodes in a single Slurm allocation and distribute work across this single resource. For instance, instead of submitting 20 one-node VASP jobs to the Slurm queue, you could request a single 20 node Slurm allocation and run them all concurrently that way. This is known as a "wide" job and is useful on Tiger and any of the large national computing facilities. Note that unlike Jobflow, however, Parsl does not have an orchestration server that runs in the background, so if the main Parsl process dies then no more orchestration will take place.
If you are looking to orchestrate pure Python calculations (especially if on your local machine) or are just feeling adventurous, you could try  🪄Prefect .