Databases

OverviewThe primary type of database we use to store results in our group is  MongoDB . When running high-throughput calculations, using a database like MongoDB makes storing, querying, and analyzing data much easier and more reproducible. 
ResourcesI strongly recommend using Studio3T to interact with MongoDB, as it has a  free pro license for academics  and is easy to use. The Studio3T  Getting Started guide  has most of the things you need to learn how to use this program. The " Visual Query Builder " in Studio3T is especially helpful and is strongly recommended! When in doubt, ChatGPT is probably pretty good at suggesting how to make MongoDB queries.
Studio3TIf you're using Studio3T to connect to the MongoDB collection, you must be on the Global Protect VPN to avoid the 2FA.
We have a MongoDB instance that runs on tiger-arrk. To connect to the MongoDB instance on tiger-arrk with Studio3T, use the following configuration:
Authentication
Authentication Mode: Legacy (SCRAM-SHA-1)
User name: Your MongoDB username (typically your NetID)
Password: Your MongoDB password (not your NetID password!). If you don't know this, it probably means you need to ask for an account to be made. 
Authentication DB: Your database name (typically your NetID)
SSH
Enable "Use SSH tunnel to connect"
SSH Address:  tiger-arrk.princeton.edu 
SSH Username: Your NetID
SSH Auth Mode: Password (or your private key, if set up)
SSH Password (if not using a private key): Your NetID password
Using Studio3T with NERSC? Check out ﻿ 🌌⁠⁠Perlmutter⁠  for more details.
MaggmaTo query the MongoDB collection from Python, you can use  maggma . For instance, from the login node, compute node, or visualization node:
from maggma.stores import MongoStore
﻿
store = MongoStore(
    host="localhost", # or 10.36.48.21 from compute node
    database="MyDBName",
    username="MyDBUserName",
    password="MyDBPassword",
    collection_name="MyDBCollectionName"
)
with store:
    print(store.count())
To make queries with maggma, see the  documentation .
To query a given database, simply use store.query({"key_name": "value"}) (store is the variable name of the database here), which returns a generator that has all entries in the database that match your query. A for loop must be used to iterate over the generator, from which specific values of the entry can be extracted. 
with store:
    results = list(store.query({"name": "relax_jobpbe"}))
    # returns a generator with all entries in the database 
    # that have name = relax_jobpbe
﻿
all_energies = []
    for result in results:
        nrg = result["output"]["output"]["energy"] 
        # the nesting in result will match the nesting on the Studio 3T
        # software, so entry_id -> output -> output -> energy is the path
        # being extracted here
        
        all_energies.append(nrg)
print(all_energies)
BackupPrinceton Research Computing has enabled automatic backups of the MongoDB instance on the tiger-arrk login node. However, you may choose to occasionally make backups yourself. This can be done using the  Studio3T Export Wizard .