JFR Setup on Princeton Machines

TigerRegular SubmissionThe following is a representative setup for Jobflow Remote, specifically for use on our tiger-arrk login node.
Run pip install jobflow-remote in your Conda environment on tiger-arrk
Run jf project generate cms in the terminal
Replace the file ~/.jfremote/cms.yaml with the following. Edit any necessary fields (i.e. those with <NetID>). If you have not completed the instructions in ﻿ 💽⁠⁠Databases⁠ , do that first. This YAML will create a project named cms with two workers named basic_python and basic_vasp, but you can always add other workers to this file too for different kinds of calculations.
The Slurm resource information can be found in the resources section and can be adjusted as needed (such as the time). All available keys for resources can be found in the  qtoolkit documentation  (see the ${} entries).
name: cms
workers:
  basic_vasp:
    type: local
    scheduler_type: slurm
    work_dir: /scratch/gpfs/ROSENGROUP/<NetID>/jobflow/vasp
    pre_run: |
      source ~/.bashrc
      module load anaconda3/2025.12
      conda activate cms
      module load vasp/6.5.1
      export QUACC_VASP_PARALLEL_CMD="srun --nodes 1 --ntasks-per-node 112"
      export QUACC_WORKFLOW_ENGINE=jobflow
    timeout_execute: 60
    max_jobs: 50
    resources:
      nodes: 1
      ntasks_per_node: 112
      cpus_per_task: 1
      mem: 900G
      time: 04:00:00
      account: rosengroup
  basic_python:
    type: local
    scheduler_type: slurm
    work_dir: /scratch/gpfs/ROSENGROUP/<NetID>/jobflow/python
    pre_run: |
      source ~/.bashrc
      module load anaconda3/2025.12
      conda activate cms
      export QUACC_WORKFLOW_ENGINE=jobflow
    timeout_execute: 60
    resources:
      nodes: 1
      ntasks_per_node: 1
      cpus_per_task: 1
      mem: 8G
      time: 04:00:00
      account: rosengroup
queue:
  store:
    type: MongoStore
    host: localhost
    database: <MongoDB Database Name>
    username: <MongoDB UserName>
    password: <MongoDB PW>
    collection_name: jf_jobs
  flows_collection: jf_flows
  auxiliary_collection: jf_aux
  batches_collection: jf_batches
exec_config: {}
jobstore:
  docs_store:
    type: MongoStore
    database: <MongoDB Database Name>
    host: localhost
    username: <MongoDB UserName>
    password: <MongoDB PW>
    collection_name: jf_outputs
Run jf project check --errors to confirm that everything is set up correctly.
Launch the runner via jf runner start
Confirm that everything truly works by running the following minimal example.
from jobflow_remote.utils.examples import add
from jobflow_remote import submit_flow
from jobflow import Flow
﻿
job1 = add(1, 2)
job2 = add(job1.output, 2)
flow = Flow([job1, job2])
﻿
ids = submit_flow(flow, worker="basic_python")
print(ids)
If functional, you should see two job IDs via jf job list and one flow ID via jf flow list. Eventually, they should reach a state of COMPLETED once they get through the Slurm queue. Keep refreshing the jf job list until they are COMPLETED.
Note that we have imported a pre-defined add function. This is because Jobflow Remote needs all function calls to be importable.
Importantly, check out the results of your run in your MongoDB database. See ﻿ 💽⁠⁠Databases⁠  for details on how to access your MongoDB database.
When you're done running workflows, you can terminate the daemon via jf runner stop so it's not just running idle.
If you ever need to update the worker configuration, you will need to restart the active runner via jf runner stop and then jf runner start for the changes to be seen by the runner.
Batch SubmissionThe "Regular Submission" approach submits one @job per Slurm job. For instance, if max_jobs: 50, then you would have at most 50 Slurm jobs in the queue and at most 50 @job-decorated functions running at a time.
Jobflow-Remote offers other queuing options as well. One is  batch mode . In batch mode, each Slurm job will continually pull in new work until the walltime is hit. This is convenient if your @jobs are quite short and you don't want to have to submit a new Slurm job for every @job.
To use batch mode, modify the YAML as follows. Note the inclusion of the batch: field. This tells Jobflow-Remote to use batch mode. In this setup, Jobflow-Remote will launch at most 50 Slurm jobs, and each one will continually run a new @job until the walltime is hit.
Batch mode runs the risk of some @jobs timing out out once the walltime is hit, which you will then have to rerun. Optionally, you can set the max_time: <int> field under the batch field to define the number of seconds after which a new @job will not start. As such, max_time should be a value less than the walltime.
name: cms
workers:
  <Worker Name>:
    max_jobs: 50
    resources:
      nodes: 1
      ntasks_per_node: 112
      cpus_per_task: 1
      mem: 900G
      time: 04:00:00
      account: rosengroup
    batch:
      jobs_handle_dir: /scratch/gpfs/ROSENGROUP/<NetID>/jobflow/vaps/jfr_handle_dir
      work_dir: /scratch/gpfs/ROSENGROUP/<NetID>/jobflow/vasp/jfr_batch_jobs
  basic_python:
Parallel Batch ModeFinally, there is  parallel batch mode . Parallel batch mode is like batch mode but allows you to run multiple @jobs concurrently per Slurm job. This is particularly useful if you want to request a large number of nodes per Slurm job and have a small number of jobs in the queue. This is generally not needed on Tiger but is useful for other machines. We will demonstrate how to do so anyway.
The YAML for a parallel batch job will look like the following. Note the addition of the parallel_jobs field, which tells Jobflow-Remote how many @jobs to run in parallel in a given Slurm job. Here, we have chosen to have parallel_jobs: 4 to indicate that there are 4 VASP jobs per Slurm job. Accordingly, we have set nodes: 4 so that each Slurm job requests 4 nodes and has enough resources to run 4 one-node VASP jobs. We reduced max_jobs: 5 so that we do not have an enormous number of jobs in the queue.
name: cms
workers:
  <Worker Name>:
    max_jobs: 5
    resources:
      nodes: 4
      ntasks_per_node: 112
      cpus_per_task: 1
      mem: 900G
      time: 04:00:00
      account: rosengroup
    batch:
      jobs_handle_dir: /scratch/gpfs/ROSENGROUP/<NetID>/jobflow/vaps/jfr_handle_dir
      work_dir: /scratch/gpfs/ROSENGROUP/<NetID>/jobflow/vasp/jfr_batch_jobs
  basic_python
      parallel_jobs: 4
DellaYou can use Jobflow-Remote to submit jobs on Della too, but everything must still be orchestrated from the tiger-arrk login node. In this case, you have to make a new worker and set the type to remote instead of local. The steps are outlined below:
Make sure that you can ssh between tiger-arrk and Della without a password or 2FA. To do this, you need to set up SSH keys between tiger-arrk and Della. If you followed the ﻿ 😴⁠⁠Removing Tedium⁠  guide, then it's the same process except now it's between tiger-arrk and Della instead of between your local machine and the clusters.
Install Jobflow-Remote on Della, and make sure that the versions of both Jobflow and Jobflow-Remote are the same on both machines.
Modify your Jobflow-Remote YAML config file to add another worker that has the appropriate details for submitting a job on Della. For instance, add the following worker to your list of workers:
name: cms
workers:
  basic_della_ml:
    type: remote
    host: della.princeton.edu
    user: <NetID>
    scheduler_type: slurm
    work_dir: /scratch/gpfs/ROSENGROUP/<NetID>/jobflow/ml
    pre_run: |
      source ~/.bashrc
      module load anaconda3/2025.12
      conda activate cms
      export QUACC_WORKFLOW_ENGINE=jobflow
      export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
    timeout_execute: 60
    resources:
      gres: gpu:1
      nodes: 1
      ntasks_per_node: 20
      cpus_per_task: 1
      mem: 36G
      time: 04:00:00
      account: rosengroup
Run jf runner restart to ensure the changes take place.
Run jf project check --errors to ensure there are no configuration errors.
You can then submit jobs and flows to Della from tiger-arrk by using your newly created worker.
Running Jobs on StellarYou can use Jobflow-Remote to submit jobs on Stellar too, but everything must still be orchestrated from the tiger-arrk login node. In this case, you have to make a new worker and set the type to remote instead of local. The steps are outlined below:
Make sure that you can ssh between tiger-arrk and Della without a password or 2FA. To do this, you need to set up SSH keys between tiger-arrk and Della. If you followed the ﻿ 😴⁠⁠Removing Tedium⁠  guide, then it's the same process except now it's between tiger-arrk and Della instead of between your local machine and the clusters.
Install Jobflow-Remote on Stellar, and make sure that the versions of both Jobflow and Jobflow-Remote are the same on both machines.
Modify your Jobflow-Remote YAML config file to add another worker that has the appropriate details for submitting a job on Stellar. For instance, add the following worker to your list of workers:
names: cms
workers:
  basic_stellar_vasp:
    type: remote
    host: stellar.princeton.edu
    user: <NetID>
    scheduler_type: slurm
    work_dir: /scratch/gpfs/ROSENGROUP/<NetID>/jobflow/vasp
    max_jobs: 1
    pre_run: |
      source ~/.bashrc
      module load anaconda3/2025.12
      conda activate cms
      module load vasp/6.5.1
      export QUACC_VASP_PARALLEL_CMD="srun -N 1 --ntasks-per-node 96"
      export QUACC_WORKFLOW_ENGINE=jobflow
    timeout_execute: 60
    resources:
      account: cbe
      time: 04:00:00
      nodes: 1
      ntasks_per_node: 96
      cpus_per_task: 1
      mem: 700G
﻿