Databases

Overview

The primary type of database we use to store results in our group is  MongoDB . When running high-throughput calculations, using a database like MongoDB makes storing, querying, and analyzing data much easier and more reproducible.

Setup

Ask Andrew to make you a MongoDB database. Then, store the credentials he gives you. Instructions below are for Andrew.
  • Log into Studio 3T as superman to the admin DB. Select Add Database and use the NetID of the desired user as the DB name. Click on the newly created DB and select Users. Add a user with the same NetID and a custom password, granting them a DB Owner role. Connect via the newly created user and DB. Add a New Collection so the DB is persistent.

Resources

GUI

For a GUI interface to your MongoDB, we recommend  🍃VisuaLeaf . A backup option is  🧭MongoDB Compass .

Maggma

To query the MongoDB collection from Python, you can use  maggma . For instance, from the login node, compute node, or visualization node:
from maggma.stores import MongoStore

store = MongoStore(
host="localhost", # or 10.36.48.21 from compute node
database="MyDBName",
username="MyDBUserName",
password="MyDBPassword",
collection_name="MyDBCollectionName"
)
with store:
print(store.count())
To make queries with maggma, see the  documentation .
To query a given database, simply use store.query({"key_name": "value"}) (store is the variable name of the database here), which returns a generator that has all entries in the database that match your query. A for loop must be used to iterate over the generator, from which specific values of the entry can be extracted.
with store:
results = list(store.query({"name": "relax_jobpbe"}))
# returns a generator with all entries in the database
# that have name = relax_jobpbe

all_energies = []
for result in results:
nrg = result["output"]["output"]["energy"]
# the nesting in result will match the nesting on the Studio 3T
# software, so entry_id -> output -> output -> energy is the path
# being extracted here
all_energies.append(nrg)
print(all_energies)

Backup

Princeton Research Computing has enabled automatic backups of the MongoDB instance on the tiger-arrk login node. However, you may choose to occasionally make backups yourself. This can be done using most available GUIs.