Running Atomate2 on DeltaAI

Setup

Installation

Follow the setup instructions in  🤖NCSA DeltaAI , specifically with regards to logging in and setting up your ~/.bashrc.
You will have to install your own Anaconda software locally. Download and install  miniconda  or similar. Once installed, make an environment named cms and install the newest versions of the Atomate2 stack:
pip install uv
uv pip install atomate2 jobflow-remote

Setting Up Atomate2

Run
pmg config --add PMG_DEFAULT_FUNCTIONAL PBE_64
Make the ~/.atomate2.yaml file as follows:
VASP_CMD: srun vasp_std
VASP_GAMMA_CMD: srun vasp_gam
CUSTODIAN_SCRATCH_DIR: /tmp
Now test an Atomate2 job (job.py) with the following Slurm submission script:
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --gpus-per-node=1
#SBATCH --mem=90g
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=1
#SBATCH --partition=ghx4
#SBATCH --time=00:30:00
#SBATCH --job-name=test
#SBATCH --account=bems-dtai-gh

source ~/.bashrc
conda activate cms
module load vasp/6.5.1_gpu

export MSGSIZE=16777216
export ITERS=400

python job.py > job.out
Note that when the job runs, because of CUSTODIAN_SCRATCH_DIR: /tmp in ~/.atomate2.yaaml the job will run in the /tmp directory that is only viewable on the compute node. If you want to monitor the job, do ssh ghXXX where XXX is the ID number from the NODELIST when you do squeue -u UserName. This will allow you to view the scratch_link symbolic link while the job runs.
In your Maker classes in Atomate2, you can add run_vasp_kwargs = {"custodian_kwargs": {"gzipped_output": True}} to speed up gzipping by having it done in the CUSTODIAN_SCRATCH_DIR instead of the job submission directory.

Jobflow-Remote Setup

Follow the instructions at  🤖JFR Setup on NCSA DeltaAI .

Trying Out Atomate2: Now With Jobflow-Remote

Now we will run the Atomate2 script again but with Jobflow-Remote.
First, make sure the Jobflow-Remote daemon is running in the background by doing jf runner start.
In your Atomate2 code, you will replace the run_locally command with submit_flow. Basically, it will look like your normal code with run_locally swapped out for submit_flow and the name of the worker:
from jobflow_remote import submit_flow

...

submit_flow(flow, worker="basic_vasp")
Then submit the flow by running the Python script (e.g. python job.py) from the login node.
Let Jobflow-Remote do the rest and monitor things with jf job list and squeue as needed.

Running

To run calculations for real, you will want to increase the time set in the ~/.jremote/cms.yaml file. The YAML is currently set to run in batch mode, such that the Slurm job will continually pull in new work when a calculation finishes until the walltime is reached. If you prefer to only have one calculation run per Slurm job, remove the batch section in the YAML.
There are a few things that you should regularly check over the course of the campaign:
    The remaining GPU hours, which you can check via accounts
    The remaining storage space, which you can check via quota
    If using Jobflow-Remote with MongoDB Atlas, the amount of storage space left on our MongoDB Atlas cluster and the number of operations per second. You can view this on the MongoDB Atlas webpage under Database > Clusters (R/W and Data Size).