Jobflow

Jobflow﻿ Jobflow  and its companion  Jobflow-Remote  are useful tools for orchestrating large numbers of jobs, especially those with complex workflows. As of right now, it is the recommended workflow tool in the group for most tasks due to its close integration with Slurm.
Using JobflowBefore configuring Jobflow-Remote, it is strongly recommended that you check out the  Jobflow tutorials  that you can run on your local machine. Then you can start using Jobflow-Remote to run workflows on the Slurm cluster.
The simplest example of using Jobflow locally is shown below:
from jobflow import job, Flow, run_locally
﻿
@job
def add(a, b):
    return a + b
﻿
job1 = add(1, 2)
job2 = add(job1.output, 2)
flow = Flow([job1, job2])
response = run_locally(flow)
Configuring and Testing Jobflow RemoteThe following is a representative setup for Jobflow Remote, specifically for use on our tiger-arrk login node.
Run pip install jobflow-remote[gui] in your Conda environment on tiger-arrk
Run jf project generate cms in the terminal
Replace the file ~/.jfremote/cms.yaml with the following. Edit any necessary fields (i.e. those with <NetID>). If you have not completed the instructions in ﻿ 💽⁠⁠Databases⁠ , do that first. This will create a project named cms with two workers named basic_python and basic_vasp, but you can always add other workers to this file too for different kinds of calculations. The Slurm resource information can be found in the resources section. All available keys for resources can be found in the  qtoolkit documentation  (see the ${} entries).
name: cms
workers:
  basic_vasp:
    type: local
    scheduler_type: slurm
    work_dir: /scratch/gpfs/ROSENGROUP/<NetID>/jobflow/vasp
    pre_run: |
      source ~/.bashrc
      module load anaconda3/2024.10
      conda activate cms
      module load vasp/6.5.1
      export QUACC_VASP_PARALLEL_CMD="srun --nodes 1 --ntasks-per-node 112"
      export QUACC_WORKFLOW_ENGINE=jobflow
      export QUACC_CREATE_UNIQUE_DIR=False
    timeout_execute: 60
    resources:
      nodes: 1
      ntasks_per_node: 112
      cpus_per_task: 1
      mem: 900G
      time: 04:00:00
      account: rosengroup
  basic_python:
    type: local
    scheduler_type: slurm
    work_dir: /scratch/gpfs/ROSENGROUP/<NetID>/jobflow/python
    pre_run: |
      source ~/.bashrc
      module load anaconda3/2024.10
      conda activate cms
      export QUACC_WORKFLOW_ENGINE=jobflow
      export QUACC_CREATE_UNIQUE_DIR=False
    timeout_execute: 60
    resources:
      nodes: 1
      ntasks_per_node: 1
      cpus_per_task: 1
      mem: 8G
      time: 04:00:00
      account: rosengroup
queue:
  store:
    type: MongoStore
    host: localhost
    database: <MongoDB Database Name>
    username: <MongoDB UserName>
    password: <MongoDB PW>
    collection_name: jf_jobs
  flows_collection: jf_flows
  auxiliary_collection: jf_aux
exec_config: {}
jobstore:
  docs_store:
    type: MongoStore
    database: <MongoDB Database Name>
    host: localhost
    username: <MongoDB UserName>
    password: <MongoDB PW>
    collection_name: jf_outputs
Run jf project check --errors to confirm that everything is set up correctly.
Run jf admin upgrade
Launch the runner via jf runner start
You can always check what processes you have running using ps -u <NetID>
Confirm that everything truly works by running the following minimal example.
from jobflow_remote.utils.examples import add
from jobflow_remote import submit_flow
from jobflow import Flow
﻿
job1 = add(1, 2)
job2 = add(job1.output, 2)
flow = Flow([job1, job2])
﻿
ids = submit_flow(flow, worker="basic_python")
print(ids)
If functional, you should see two job IDs via jf job list and one flow ID via jf flow list. Eventually, they should reach a state of COMPLETED once they get through the Slurm queue. Keep refreshing the jf job list until they are COMPLETED.
Note that we have imported a pre-defined add function. This is because Jobflow Remote needs all function calls to be importable.
Importantly, check out the results of your run in your MongoDB database. See ﻿ 💽⁠⁠Databases⁠  for details on how to access your MongoDB database with Studio 3T. You will need to become familiar with using MongoDB to query and access your data. Studio 3T's  Visual Query Builder  is excellent for this.
If you'd like, check out the GUI via jf gui, which can be viewed in Firefox via the Desktop on Tiger3 Vis Nodes feature on  mytiger.princeton.edu  or via x-forwarding.
When you're done running workflows, you can terminate the daemon via jf runner stop so it's not just running idle.
If you ever need to update the worker configuration, you will need to stop the active runner via jf runner restart for the changes to be seen by the runner.
Running Jobs on DellaYou can use Jobflow-Remote to submit jobs on Della too. In this case, you have to make a new worker and set the type to remote instead of local. The steps are outlined below:
Make sure that you can ssh between tiger-arrk and Della without a password or 2FA. To do this, you need to set up SSH keys between tiger-arrk and Della. If you followed the ﻿ 😴⁠⁠Removing Tedium⁠  guide, then it's the same process except now it's between tiger-arrk and Della instead of between your local machine and the clusters.
Modify your Jobflow-Remote YAML config file to have the appropriate details for submitting a job on Della. For instance, add the following worker to your list of workers:
    basic_della_ml:
      type: remote
      host: della.princeton.edu
      user: <NetID>
      scheduler_type: slurm
      work_dir: /scratch/gpfs/ROSENGROUP/<NetID>/jobflow/ml
      pre_run: |
        source ~/.bashrc
        module load anaconda3/2024.10
        conda activate cms
        export QUACC_WORKFLOW_ENGINE=jobflow
        export QUACC_CREATE_UNIQUE_DIR=False
      timeout_execute: 60
      resources:
        gres: gpu:1
        nodes: 1
        ntasks_per_node: 20
        cpus_per_task: 1
        mem: 36G
        time: 04:00:00
        account: rosengroup
Run jf runner restart to ensure the changes take place. You can then submit jobs and flows to Della from tiger-arrk by using your newly created worker.
Jobflow Remote with QuaccPure Python ExampleRun the following example that runs a simple two-step flow with a quick Effective Medium Theory (EMT) set of calculations on two structures. If functional, you should see 4 new job IDs via jf job list and 3 new flow IDs via jf flow list. Eventually, they should reach a state of COMPLETED once they get through the Slurm queue.
import os
﻿
os.environ["QUACC_WORKFLOW_ENGINE"] = "jobflow"
﻿
from ase.build import bulk
from jobflow import Flow
from jobflow_remote import submit_flow
from quacc.recipes.emt.core import relax_job, static_job
﻿
atoms_list = [bulk("Al"), bulk("Cu")]
for atoms in atoms_list:
    job1 = relax_job(atoms, relax_cell=True)
    job2 = static_job(job1.output["atoms"])
    flow = Flow([job1, job2])
    
    submit_flow(flow, worker="basic_python")
VASP ExampleThe following example will use the basic_vasp runner and run a representative VASP workflow consisting of a relaxation and subsequent static calculation on bulk Si.
import os
﻿
os.environ["QUACC_WORKFLOW_ENGINE"] = "jobflow"
﻿
from ase.build import bulk
from jobflow import Flow
from jobflow_remote import submit_flow
from quacc.recipes.vasp.core import relax_job, static_job
﻿
atoms = bulk("Si") * (2, 2, 2)
atoms.set_initial_magnetic_moments([0.0] * len(atoms))
﻿
job1 = relax_job(atoms, relax_cell=True)
job2 = static_job(job1.output["atoms"])
flow = Flow([job1, job2])
﻿
submit_flow(flow, worker="basic_vasp")
﻿