JFR Setup on NCSA DeltaAI

Here, we have some instructions on setting up Jobflow-Remote on  🤖NCSA DeltaAI .
Run mkdir ~/.jfremote and then make a file ~/.jfremote/cms.yaml. Modify the username and password fields in the example bellow. Also update work_dir as desired and the time for the job as needed. For testing purposes, 30 mins is recommended.
The YAML below relies on a  MongoDB Atlas  cluster to use as the MongoStore. The free tier of MongoDB Atlas is fine, but you likely won't be able to store the output of the jobs in the database due to file size limitations (the free tier has a 512 MB limit).
If you are making your own MongoDB Atlas database, you will need to swap out the host described below. To find the host, go to Database > Clusters > Connect > Drivers on MongoDB Atlas and copy the analogous part of the connection string.
name: cms
workers:
basic_vasp:
type: local
scheduler_type: slurm
work_dir: /work/nvme/bems/<USERNAME>/jfr_runs
max_jobs: 1
pre_run: |
source ~/.bashrc
conda activate cms
module load vasp/6.5.1_gpu
export MSGSIZE=16777216
export ITERS=400
timeout_execute: 60
resources:
nodes: 1
gres: gpu:1
mem: 90G
ntasks_per_node: 1
cpus_per_task: 1
partition: ghx4
time: 00:30:00
account: bems-dtai-gh
batch:
jobs_handle_dir: /work/nvme/bems/<USERNAME>/jfr_handle_dir
work_dir: /work/nvme/bems/<USERNAME>/jfr_batch_jobs
queue:
store:
type: MongoStore
host: mongodb+srv://<DB_USERNAME>:<DB_PASSWORD>@cluster0.clj61kx.mongodb.net
database: mofs
port: 27017
collection_name: jf_jobs
flows_collection: jf_flows
auxiliary_collection: jf_aux
exec_config: {}
jobstore:
docs_store:
type: MemoryStore
additional_stores:
data:
type: MemoryStore
Run jf project check --errors to make sure everything works okay.
In order to have your Jobflow-Remote daemon run on the login node without being automatically killed, you will need to open a ticket with the NCSA staff to request an exemption for the jf and supervisord processes.