Force field query¶
sGDML force fields (see force field reconstruction) are straightforward to use. In the following example we use a pre-trained model to predict the energy and forces for a ethanol geometry stored as XYZ file (download
):
import numpy as np
from sgdml.predict import GDMLPredict
from sgdml.utils import io
model = np.load('m_ethanol.npz')
gdml = GDMLPredict(model)
r,_ = io.read_xyz('ethanol.xyz') # 9 atoms
e,f = gdml.predict(r)
print(r.shape) # (1,27)
print(e.shape) # (1,)
print(f.shape) # (1,27)
Here, the sGDML predictor is instantiated with the model file m_ethanol.npz
and queried using the ethanol geometry r
imported from ethanol.xyz
. It returns the energy e
and all interatomic forces f
for this structure.
In this example, r
is an array of dimension 1 x 3N
, containing the Cartesian coordinates of N
atoms, but we could as well have passed an M x 3N
-dimensional array containing M
geometries at once, to generate multiple energy and force predictions simultaneously.
Warning
The order of atoms in ethanol.xyz
must be consistent with the dataset on which the model m_ethanol.npz
was originally trained.
Warning
The distance unit of the input geometry (here: Ångström) must match the unit in the dataset that was used for model training.
Multi-CPU support¶
The sgdml
package is able to parallelize calculations across multiple CPU cores, which is especially beneficial when querying multiple geometries at once in every call to gdml.predict()
. To ensure optimal performance in a particular compute environment, given a particular model file, we recommend running
gdml.prepare_parallel(n_bulk=128)
right after initialization of the model. This function will determine the optimal parallelization settings and apply them in one step. Here, n_bulk
is the number of geometries M
that we plan on querying in each call to gdml.predict()
.
The maximum number of processes (= CPU cores) that sGDML is allowed to use can be globally limited by specifying max_processes
during initialization of the predictor:
gdml = GDMLPredict(model, max_processes=12)
Note
Running the benchmark can take some time (i.e. seconds to minutes, depending on the model). However the result for a each configuration of training points, number of atoms and choice of n_bulk
is cached so that gdml.prepare_parallel()
returns instantly on subsequent calls (after repeating the benchmark several times).
Multi-GPU support¶
Setting use_torch=True
when instantiating the predictor redirects all calculations to PyTorch, which automatically uses GPUs, if available.
gdml = GDMLPredict(model, use_torch=True)
Warning
PyTorch must be installed with GPU support, otherwise it falls back on the CPU. However, we recommend running CPU calculations without the PyTorch flag, as our own CPU implementation is faster.