Databases

Overview

The primary type of database we use to store results in our group is  MongoDB . When running high-throughput calculations, using a database like MongoDB makes storing, querying, and analyzing data much easier and more reproducible.

Resources

I recommend using Studio3T to interact with your MongoDB, as it has a  free pro license for academics  and is easy to use. The Studio3T  Getting Started guide  has most of the things you need to learn how to use this program.

In addition to Studio3T, which is a GUI, we use  maggma  as a Pythonic way to store and query results in our databases as needed.

Data Transformations

If you want to carry out complex data transformations on your database, there are two routes:
Use  maggma  to fetch the data, transform it into a  pandas  dataframe, and then carry out your desired transformations.
Use a dataflow program, such as  Mage , to construct and orchestrate the data pipeline.

For relatively simple tasks,  maggma  and a Jupyter Notebook will typically get the job done just fine. For more complex tasks, you may wish to consider a data pipeline program like Mage.