Databases

Overview

The primary type of database we use to store results in our group is  MongoDB . When running high-throughput calculations, using a database like MongoDB makes storing, querying, and analyzing data much easier and more reproducible.

Resources

I strongly recommend using Studio3T to interact with MongoDB, as it has a  free pro license for academics  and is easy to use. The Studio3T  Getting Started guide  has most of the things you need to learn how to use this program. The " Visual Query Builder " in Studio3T is especially helpful and is strongly recommended! When in doubt, ChatGPT is probably pretty good at suggesting how to make MongoDB queries.

Studio3T

If you're using Studio3T to connect to the MongoDB collection, you must be on the Global Protect VPN to avoid the 2FA.
We have a MongoDB instance that runs on tiger-arrk. To connect to the MongoDB instance on tiger-arrk with Studio3T, use the following configuration:
  • Authentication
  • Authentication Mode: Legacy (SCRAM-SHA-1)
  • User name: Your MongoDB username (typically your NetID)
  • Password: Your MongoDB password (not your NetID password!). If you don't know this, it probably means you need to ask for an account to be made.
  • Authentication DB: Your database name (typically your NetID)
  • SSH
  • Enable "Use SSH tunnel to connect"
  • SSH Address:  tiger-arrk.princeton.edu 
  • SSH Username: Your NetID
  • SSH Auth Mode: Password (or your private key, if set up)
  • SSH Password (if not using a private key): Your NetID password

Maggma

To query the MongoDB collection from Python, you can use  maggma . For instance, from the login node or compute node:
from maggma.stores import MongoStore

store = MongoStore(
host="localhost", # or 10.36.48.21 from compute node
database="MyDBName",
username="MyDBUserName",
password="MyDBPassword",
collection_name="MyDBCollectionName"
)
with store:
print(store.count())
To make queries with maggma, see the  documentation .