Databases

Overview

The primary type of database we use to store results in our group is  MongoDB . When running high-throughput calculations, using a database like MongoDB makes storing, querying, and analyzing data much easier and more reproducible.

Resources

I strongly recommend using Studio3T to interact with MongoDB, as it has a  free pro license for academics  and is easy to use. The Studio3T  Getting Started guide  has most of the things you need to learn how to use this program. The " Visual Query Builder " in Studio3T is especially helpful and is strongly recommended! When in doubt, ChatGPT is probably pretty good at suggesting how to make MongoDB queries.

Studio3T

If you're using Studio3T to connect to the MongoDB collection, you must be on the Global Protect VPN to avoid the 2FA.
We have a MongoDB instance that runs on tiger-arrk. To connect to the MongoDB instance on tiger-arrk with Studio3T, use the following configuration:
  • Authentication
  • Authentication Mode: Legacy (SCRAM-SHA-1)
  • User name: Your MongoDB username (typically your NetID)
  • Password: Your MongoDB password (not your NetID password!). If you don't know this, it probably means you need to ask for an account to be made.
  • Authentication DB: Your database name (typically your NetID)
  • SSH
  • Enable "Use SSH tunnel to connect"
  • SSH Address:  tiger-arrk.princeton.edu 
  • SSH Username: Your NetID
  • SSH Auth Mode: Password (or your private key, if set up)
  • SSH Password (if not using a private key): Your NetID password

Maggma

To query the MongoDB collection from Python, you can use  maggma . For instance, from the login node, compute node, or visualization node:
from maggma.stores import MongoStore

store = MongoStore(
host="localhost", # or 10.36.48.21 from compute node
database="MyDBName",
username="MyDBUserName",
password="MyDBPassword",
collection_name="MyDBCollectionName"
)
with store:
print(store.count())
To make queries with maggma, see the  documentation .
To query a given database, simply use store.query({"key_name": "value"}) (store is the variable name of the database here), which returns a generator that has all entries in the database that match your query. A for loop must be used to iterate over the generator, from which specific values of the entry can be extracted.
with store:
results = list(store.query({"name": "relax_jobpbe"}))
# returns a generator with all entries in the database
# that have name = relax_jobpbe

all_energies = []
for result in results:
nrg = result["output"]["output"]["energy"]
# the nesting in result will match the nesting on the Studio 3T
# software, so entry_id -> output -> output -> energy is the path
# being extracted here
all_energies.append(nrg)
print(all_energies)