sgdml.utils package¶
sgdml.utils.desc module¶
-
class
sgdml.utils.desc.
Desc
(n_atoms, max_processes=None)[source]¶ Bases:
object
-
d_desc_from_comp
(R_d_desc, out=None)[source]¶ Convert a compressed representation of a descriptor Jacobian back to its full representation.
The compressed representation omits all zeros and scales with N instead of N(N-1)/2.
- Parameters
R_d_desc (
numpy.ndarray
ortorch.tensor
) – Array of size M x N x N x 3 containing the compressed descriptor Jacobian.out (
numpy.ndarray
ortorch.tensor
, optional) – Output argument. This must have the exact kind that would be returned if it was not used.
Note
If used, the output argument must be initialized with zeros!
- Returns
Array of size M x N(N-1)/2 x 3N containing the full representation.
- Return type
numpy.ndarray
ortorch.tensor
-
d_desc_to_comp
(R_d_desc)[source]¶ Convert a descriptor Jacobian to a compressed representation.
The compressed representation omits all zeros and scales with N instead of N(N-1)/2.
- Parameters
R_d_desc (
numpy.ndarray
) – Array of size M x N(N-1)/2 x 3N containing the descriptor Jacobian.- Returns
Array of size M x N x N x 3 containing the compressed representation.
- Return type
numpy.ndarray
-
from_R
(R, lat_and_inv=None, max_processes=None, callback=None)[source]¶ Generate descriptor and its Jacobian for multiple molecular geometries in Cartesian coordinates.
- Parameters
R (
numpy.ndarray
) – Array of size M x 3N containing the Cartesian coordinates of each atom.lat_and_inv (tuple of
numpy.ndarray
, optional) – Tuple of 3 x 3 matrix containing lattice vectors as columns and its inverse.max_processes (int, optional) – Limit the max. number of processes. Otherwise all CPU cores are used. This parameter overwrites the global setting as set during initialization.
callback (callable, optional) –
- Descriptor and descriptor Jacobian generation status.
- currentint
Current progress (number of completed descriptors).
- totalint
Task size (total number of descriptors to create).
- sec_disp_str
str
, optional Once complete, this string contains the time it took complete this task (seconds).
- Returns
numpy.ndarray
– Array of size M x N(N-1)/2 containing the descriptor representation for each geometry.numpy.ndarray
– Array of size M x N(N-1)/2 x 3N containing all partial derivatives of the descriptor for each geometry.
-
static
perm
(perm)[source]¶ Convert atom permutation to descriptor permutation.
A permutation of N atoms is converted to a permutation that acts on the corresponding descriptor representation. Applying the converted permutation to a descriptor is equivalent to permuting the atoms first and then generating the descriptor.
- Parameters
perm (
numpy.ndarray
) – Array of size N containing the atom permutation.- Returns
Array of size N(N-1)/2 containing the corresponding descriptor permutation.
- Return type
numpy.ndarray
-
-
sgdml.utils.desc.
_from_r
(r, lat_and_inv=None)[source]¶ Generate descriptor and its Jacobian for one molecular geometry in Cartesian coordinates.
- Parameters
r (
numpy.ndarray
) – Array of size 3N containing the Cartesian coordinates of each atom.lat_and_inv (tuple of
numpy.ndarray
, optional) – Tuple of 3 x 3 matrix containing lattice vectors as columns and its inverse.
- Returns
numpy.ndarray
– Descriptor representation as 1D array of size N(N-1)/2numpy.ndarray
– Array of size N(N-1)/2 x 3N containing all partial derivatives of the descriptor.
-
sgdml.utils.desc.
_pbc_diff
(diffs, lat_and_inv, use_torch=False)[source]¶ Clamp differences of vectors to super cell.
- Parameters
diffs (
numpy.ndarray
) – N x 3 matrix of N pairwise differences between vectors u - vlat_and_inv (tuple of
numpy.ndarray
) – Tuple of 3 x 3 matrix containing lattice vectors as columns and its inverse.use_torch (boolean, optional) – Enable, if the inputs are PyTorch objects.
- Returns
N x 3 matrix clamped differences
- Return type
numpy.ndarray
-
sgdml.utils.desc.
_pdist
(r, lat_and_inv=None)[source]¶ Compute pairwise Euclidean distance matrix between all atoms.
- Parameters
r (
numpy.ndarray
) – Array of size 3N containing the Cartesian coordinates of each atom.lat_and_inv (tuple of
numpy.ndarray
, optional) – Tuple of 3x3 matrix containing lattice vectors as columns and its inverse.
- Returns
Array of size N(N-1)/2 containing the upper triangle of the pairwise distance matrix between atoms.
- Return type
numpy.ndarray
-
sgdml.utils.desc.
_r_to_d_desc
(r, pdist, lat_and_inv=None)[source]¶ Generate descriptor Jacobian for a set of atom positions in Cartesian coordinates.
This method can apply the minimum-image convention as periodic boundary condition for distances between atoms, given the lattice vectors.
- Parameters
r (
numpy.ndarray
) – Array of size 3N containing the Cartesian coordinates of each atom.pdist (
numpy.ndarray
) – Array of size N x N containing the Euclidean distance (2-norm) for each pair of atoms.lat_and_inv (tuple of
numpy.ndarray
, optional) – Tuple of 3 x 3 matrix containing lattice vectors as columns and its inverse.
- Returns
Array of size N(N-1)/2 x 3N containing all partial derivatives of the descriptor.
- Return type
numpy.ndarray
-
sgdml.utils.desc.
_r_to_desc
(r, pdist)[source]¶ Generate descriptor for a set of atom positions in Cartesian coordinates.
- Parameters
r (
numpy.ndarray
) – Array of size 3N containing the Cartesian coordinates of each atom.pdist (
numpy.ndarray
) – Array of size N x N containing the Euclidean distance (2-norm) for each pair of atoms.
- Returns
Descriptor representation as 1D array of size N(N-1)/2
- Return type
numpy.ndarray
sgdml.utils.io module¶
-
sgdml.utils.io.
filter_file_type
(dir, type, md5_match=None)[source]¶ Filters all files from a directory that match a given type and (optionally) a given fingerprint.
- Parameters
arg (
str
) – File path.type ({‘dataset’, ‘task’, ‘model’}) – Possible file types.
md5_match (
str
, optional) – Fingerprint string.
- Returns
List of file names that match the specified type and fingerprint (if provided).
- Return type
list
ofstr
- Raises
ArgumentTypeError – If the directory contains unreadable .npz files.
-
sgdml.utils.io.
is_dir_with_file_type
(arg, type, or_file=False)[source]¶ Validate directory path and check if it contains files of the specified type.
Note
If a file path is provided, this function acts like its a directory with just one file.
- Parameters
arg (
str
) – File path.type ({‘dataset’, ‘task’, ‘model’}) – Possible file types.
or_file (bool) – If arg contains a file path, act like it’s a directory with just a single file inside.
- Returns
Tuple of directory path (as provided) and a list of contained file names of the specified type.
- Return type
(
str
,list
ofstr
)- Raises
ArgumentTypeError – If the provided directory path does not lead to a directory.
ArgumentTypeError – If directory contains unreadable files.
ArgumentTypeError – If directory contains no files of the specified type.
-
sgdml.utils.io.
is_file_type
(arg, type)[source]¶ Validate file path and check if the file is of the specified type.
- Parameters
arg (
str
) – File path.type ({‘dataset’, ‘task’, ‘model’}) – Possible file types.
- Returns
Tuple of file path (as provided) and data stored in the file. The returned instance of NpzFile class must be closed to avoid leaking file descriptors.
- Return type
(
str
,dict
)- Raises
ArgumentTypeError – If the provided file path does not lead to a NpzFile.
ArgumentTypeError – If the file is not readable.
ArgumentTypeError – If the file is of wrong type.
ArgumentTypeError – If path/fingerprint is provided, but the path is not valid.
ArgumentTypeError – If fingerprint could not be resolved.
ArgumentTypeError – If multiple files with the same fingerprint exist.
-
sgdml.utils.io.
is_strict_pos_int
(arg)[source]¶ Validate strictly positive integer input.
- Parameters
arg (
str
) – Integer as string.- Returns
Parsed integer.
- Return type
int
- Raises
ArgumentTypeError – If integer is not > 0.
-
sgdml.utils.io.
is_task_dir_resumeable
(train_dir, train_dataset, test_dataset, n_train, n_test, sigs, gdml)[source]¶ Check if a directory contains task and/or model files that match the configuration of a training process specified in the remaining arguments.
Check if the training and test datasets in each task match train_dataset and test_dataset, if the number of training and test points matches and if the choices for the kernel hyper-parameter \(\sigma\) are contained in the list. Check also, if the existing tasks/models contain symmetries and if that’s consistent with the flag gdml. This function is useful for determining if a training process can be resumed using the existing files or not.
- Parameters
train_dir (
str
) – Path to training directory.train_dataset (
dataset
) – Dataset from which training points are sampled.test_dataset (
test_dataset
) – Dataset from which test points are sampled (may be the same as train_dataset).n_train (int) – Number of training points to sample.
n_test (int) – Number of test points to sample.
sigs (
list
of int) – List of \(\sigma\) kernel hyper-parameter choices (usually: the hyper-parameter search grid)gdml (bool) – If True, don’t include any symmetries in model (GDML), otherwise do (sGDML).
- Returns
False, if any of the files in the directory do not match the training configuration.
- Return type
bool
-
sgdml.utils.io.
is_valid_file_type
(arg_in)[source]¶ Check if file is either a valid dataset, task or model file.
- Parameters
arg_in (
str
) – File path.- Returns
Tuple of file path (as provided) and data stored in the file. The returned instance of NpzFile class must be closed to avoid leaking file descriptors.
- Return type
(
str
,dict
)- Raises
ArgumentTypeError – If the provided file path does not point to a supported file type.
-
sgdml.utils.io.
parse_list_or_range
(arg)[source]¶ Parses a string that represents either an integer or a range in the notation
<start>:<step>:<stop>
.- Parameters
arg (
str
) – Integer or range string.- Returns
- Return type
int or
list
of int- Raises
ArgumentTypeError – If input can neither be interpreted as an integer nor a valid range.
sgdml.utils.perm module¶
-
sgdml.utils.perm.
bipartite_match
(R, z, lat_and_inv=None, max_processes=None, callback=None)[source]¶
-
sgdml.utils.perm.
complete_sym_group
(perms, n_perms_max=None, disp_str='Permutation group completion', callback=None)[source]¶
-
sgdml.utils.perm.
find_extra_perms
(R, z, lat_and_inv=None, callback=None, max_processes=None)[source]¶
-
sgdml.utils.perm.
find_frag_perms
(R, z, lat_and_inv=None, callback=None, max_processes=None)[source]¶
-
sgdml.utils.perm.
find_perms_via_alignment
(pts_full, frag_idxs, align_a_idxs, align_b_idxs, z, lat_and_inv=None, max_processes=None)[source]¶
-
sgdml.utils.perm.
find_perms_via_reflection
(r, z, frag_idxs, plane_3idxs, lat_and_inv=None, max_processes=None)[source]¶
sgdml.utils.ui module¶
-
sgdml.utils.ui.
callback
(current, total=1, disp_str='', sec_disp_str=None, done_with_warning=False, newline_when_done=True)[source]¶ Print progress or toggle bar.
Example (progress):
[ 45%] Task description (secondary string)
Example (toggle, not done):
[ .. ] Task description (secondary string)
Example (toggle, done):
[DONE] Task description (secondary string)
- Parameters
current (int) – How many items already processed?
total (int, optional) – Total number of items? If there is only one item, the toggle style is used.
disp_str (
str
, optional) – Task description.sec_disp_str (
str
, optional) – Additional string shown in gray.done_with_warning (bool, optional) – Indicate that the process did not finish successfully.
newline_when_done (bool, optional) – Finish with a newline character once current=total (default: True)?
-
sgdml.utils.ui.
gen_mat_str
(mat)[source]¶ Converts a matrix to a multiline string such that the decimal points align in each column. Trailing zeros are replaced with spaces.
- Parameters
mat (
numpy.ndarray
)- Returns
String representation of matrix.
- Return type
str
-
sgdml.utils.ui.
gen_range_str
(min, max)[source]¶ Generates a string that shows a minimum and maximum value, as well as the range.
Example:
<min> |-- <range> --| <max>
- Parameters
min (float) – Minimum value.
max (float) – Maximum value.
- Returns
- Return type
str
-
sgdml.utils.ui.
indent_str
(str, indent)[source]¶ Indents all lines of a multiline string right by a given number of characters.
- Parameters
str (
str
) – Multiline string.indent (int) – Number of characters added in front of each line.
- Returns
- Return type
str
-
sgdml.utils.ui.
merge_col_str
(col_str1, col_str2)[source]¶ Merges two multiline strings that represent columns in a table by concatenating each pair of lines.
Note
Both strings must have the same number of lines.
- Parameters
col_str1 (
str
) – First multiline string.col_str2 (
str
) – Second multiline string.
- Returns
- Return type
str
-
sgdml.utils.ui.
sec_callback
(current, total=1, disp_str=None, sec_disp_str=None, main_callback=None, **kwargs)[source]¶
-
sgdml.utils.ui.
str_plen
(str)[source]¶ Returns printable length of string. This function can only account for invisible characters due to string styling with
color_str
.- Parameters
str (
str
) – String.- Returns
- Return type
str
-
sgdml.utils.ui.
wrap_indent_str
(label, str, width=93)[source]¶ Wraps and indents a multiline string to arrange it with the provided label in two columns. The default maximum line already accounts for the indentation due to the logging level label.
Example:
<label><multiline string>
- Parameters
label (
str
) – Labelstr (
str
) – Multiline string.
- Returns
- Return type
str
-
sgdml.utils.ui.
wrap_str
(str, width=93)[source]¶ Wrap multiline string after a given number of characters. The default maximum line already accounts for the indentation due to the logging level label.
- Parameters
str (
str
) – Multiline string.width (int, optional) – Max number of characters in a line.
- Returns
- Return type
str