Software

Check out the  Best-of-Atomistic-Machine-Learning  and  Awesome-Materials-Informatics  lists for larger curated lists of resources too! The ones in this document are just the most relevant to us.

Features and Descriptors

The following are useful toolkits for featurizing materials and/or coming up with useful descriptors:
  •  Matminer : A collection of easy-to-use featurization methods for materials.
  •  mofdscribe : Featurization methods for MOFs, built around matminer.
  •  dscribe : Various machine learning descriptors for atomistic systems.
  •  molfeat : A library of featurizers for molecules.
  •  librascal : Representations for atomic-scale learning.
  •  maml : A python package for materials machine learning.
  •  Chemiscope : Graphical tool for the interactive exploration of materials and molecular databases

Frameworks

General-Purpose

The following are useful frameworks for training machine learning models:
  • Star  sklearn : The go-to standard for "conventional" (i.e. not graph-related) classification, regression, and clustering machine learning algorithms. This is the package you should start with when getting familiar with machine learning.
  • Star  PyTorch : A Python library for training and running neural networks. This is the deep learning framework we use in the group.
  • Star  DGL : A library for doing deep learning on graphs, which is framework-agnostic (i.e. it is interoperable with PyTorch, TensorFlow, or MXNet). This is an excellent resource if you are building new deep learning models.
  •  fastai : Basically a wrapper around PyTorch that simplifies the deep learning process for several application areas.
  •  PySR : Symbolic regression, for when having an equation is desirable.
  •  e3nn : Python package for constructing Euclidean neural networks.

Interatomic Potentials

Toolkits

  •  Autoplex : Automated pipeline to train machine learned interatomic potentials.
  •  DeePMD-kit : Package to train deep learning interatomic potentials for MD.
  •  Psiflow : An end-to-end framework for developing interatomic potentials, built around Parsl.
  •  NequIP : A library for building E(3)-equivariant interatomic potentials.
  •  FLARE : An open-source Python package for creating fast and accurate interatomic potentials.

Pre-Trained

  • Refer to  Matbench Discovery  for an up-to-date ranking of many pre-trained interatomic potentials.
  •  matgl : Materials graph library for the models M3GNet and MEGNet
  •  Allegro : A library for building equivariant interatomic potentials that is an extension of NequIP.
  •  AIRS : Several deep learning model architectures for the chemistry space.

The field of machine learning for materials chemistry moves fast. Some of these packages might become outdated quickly. Feel free to remove some and add new ones as you see fit. If a package is no longer actively maintained, it should likely be removed here.

Miscellaneous Tools

The following are miscellaneous machine learning packages that aren't necessarily domain-specific:
  • Star  Marvin : Build natural language processing interfaces in practical applications.
  • Star  Colmena : Library for steering campaigns of simulations on supercomputers
  •  meerkat : A Python package to more easily view/annotate unstructured data.
  •  SegmentAnything : A model to perform segmentation analysis on images.
  •  MolScribe : Image to chemical structure model.
  •  Paper QA : Use large language models to answer questions from document libraries with citations.