Jobflow and its companion Jobflow-Remote are useful tools for orchestrating large numbers of jobs, especially those with complex workflows. As of right now, it is the recommended workflow tool in the group for most tasks due to its close integration with Slurm.
Using Jobflow
Before configuring Jobflow-Remote, it is strongly recommended that you check out the Jobflow tutorials that you can run on your local machine. Then you can start using Jobflow-Remote to run workflows on the Slurm cluster.
The simplest example of using Jobflow locally is shown below:
from jobflow import job, Flow, run_locally
@job
defadd(a, b):
return a + b
job1 = add(1,2)
job2 = add(job1.output,2)
flow = Flow([job1, job2])
response = run_locally(flow)
Configuring and Testing Jobflow Remote
The following is a representative setup for Jobflow Remote, specifically for use on our tiger-arrk login node.
Run pip install jobflow-remote[gui] in your Conda environment on tiger-arrk
Run jf project generate cms in the terminal
Replace the file ~/.jfremote/cms.yaml with the following. Edit any necessary fields (i.e. those with <NetID>). If you have not completed the instructions in 💽Databases, do that first. This will create a project named cms with two workers named basic_python and basic_vasp, but you can always add other workers to this file too for different kinds of calculations. The Slurm resource information can be found in the resources section. All available keys for resources can be found in the qtoolkit documentation (see the ${} entries).
Run jf project check --errors to confirm that everything is set up correctly.
Run jf admin upgrade
Launch the runner via jf runner start
You can always check what processes you have running using ps -u <NetID>
Confirm that everything truly works by running the following minimal example.
from jobflow_remote.utils.examples import add
from jobflow_remote import submit_flow
from jobflow import Flow
job1 = add(1,2)
job2 = add(job1.output,2)
flow = Flow([job1, job2])
ids = submit_flow(flow, worker="basic_python")
print(ids)
If functional, you should see two job IDs via jf job list and one flow ID via jf flow list. Eventually, they should reach a state of COMPLETED once they get through the Slurm queue. Keep refreshing the jf job list until they are COMPLETED.
Note that we have imported a pre-defined add function. This is because Jobflow Remote needs all function calls to be importable.
Importantly, check out the results of your run in your MongoDB database. See 💽Databases for details on how to access your MongoDB database with Studio 3T. You will need to become familiar with using MongoDB to query and access your data. Studio 3T's Visual Query Builder is excellent for this.
If you'd like, check out the GUI via jf gui, which can be viewed in Firefox via the Desktop on Tiger3 Vis Nodes feature on mytiger.princeton.edu or via x-forwarding.
When you're done running workflows, you can terminate the daemon via jf runner stop so it's not just running idle.
If you ever need to update the worker configuration, you will need to stop the active runner via jf runner restart for the changes to be seen by the runner.
Running Jobs on Della
You can use Jobflow-Remote to submit jobs on Della too. In this case, you have to make a new worker and set the type to remote instead of local. The steps are outlined below:
Make sure that you can ssh between tiger-arrk and Della without a password or 2FA. To do this, you need to set up SSH keys between tiger-arrk and Della. If you followed the 😴Removing Tedium guide, then it's the same process except now it's between tiger-arrk and Della instead of between your local machine and the clusters.
Modify your Jobflow-Remote YAML config file to have the appropriate details for submitting a job on Della. For instance, add the following worker to your list of workers:
Run jf runner restart to ensure the changes take place. You can then submit jobs and flows to Della from tiger-arrk by using your newly created worker.
Jobflow Remote with Quacc
Pure Python Example
Run the following example that runs a simple two-step flow with a quick Effective Medium Theory (EMT) set of calculations on two structures. If functional, you should see 4 new job IDs via jf job list and 3 new flow IDs via jf flow list. Eventually, they should reach a state of COMPLETED once they get through the Slurm queue.
import os
os.environ["QUACC_WORKFLOW_ENGINE"]="jobflow"
from ase.build import bulk
from jobflow import Flow
from jobflow_remote import submit_flow
from quacc.recipes.emt.core import relax_job, static_job
atoms_list =[bulk("Al"), bulk("Cu")]
for atoms in atoms_list:
job1 = relax_job(atoms, relax_cell=True)
job2 = static_job(job1.output["atoms"])
flow = Flow([job1, job2])
submit_flow(flow, worker="basic_python")
VASP Example
The following example will use the basic_vasp runner and run a representative VASP workflow consisting of a relaxation and subsequent static calculation on bulk Si.
import os
os.environ["QUACC_WORKFLOW_ENGINE"]="jobflow"
from ase.build import bulk
from jobflow import Flow
from jobflow_remote import submit_flow
from quacc.recipes.vasp.core import relax_job, static_job