sgdml.utils package

sgdml.utils.desc module

class sgdml.utils.desc.Desc(n_atoms, max_processes=None)[source]

Bases: object

d_desc_dot_vec(R_d_desc, vecs, overwrite_vecs=False)[source]
d_desc_from_comp(R_d_desc, out=None)[source]

Convert a compressed representation of a descriptor Jacobian back to its full representation.

The compressed representation omits all zeros and scales with N instead of N(N-1)/2.

Parameters
  • R_d_desc (numpy.ndarray or torch.tensor) – Array of size M x N x N x 3 containing the compressed descriptor Jacobian.

  • out (numpy.ndarray or torch.tensor, optional) – Output argument. This must have the exact kind that would be returned if it was not used.

Note

If used, the output argument must be initialized with zeros!

Returns

Array of size M x N(N-1)/2 x 3N containing the full representation.

Return type

numpy.ndarray or torch.tensor

d_desc_to_comp(R_d_desc)[source]

Convert a descriptor Jacobian to a compressed representation.

The compressed representation omits all zeros and scales with N instead of N(N-1)/2.

Parameters

R_d_desc (numpy.ndarray) – Array of size M x N(N-1)/2 x 3N containing the descriptor Jacobian.

Returns

Array of size M x N x N x 3 containing the compressed representation.

Return type

numpy.ndarray

from_R(R, lat_and_inv=None, max_processes=None, callback=None)[source]

Generate descriptor and its Jacobian for multiple molecular geometries in Cartesian coordinates.

Parameters
  • R (numpy.ndarray) – Array of size M x 3N containing the Cartesian coordinates of each atom.

  • lat_and_inv (tuple of numpy.ndarray, optional) – Tuple of 3 x 3 matrix containing lattice vectors as columns and its inverse.

  • max_processes (int, optional) – Limit the max. number of processes. Otherwise all CPU cores are used. This parameter overwrites the global setting as set during initialization.

  • callback (callable, optional) –

    Descriptor and descriptor Jacobian generation status.
    currentint

    Current progress (number of completed descriptors).

    totalint

    Task size (total number of descriptors to create).

    sec_disp_strstr, optional

    Once complete, this string contains the time it took complete this task (seconds).

Returns

  • numpy.ndarray – Array of size M x N(N-1)/2 containing the descriptor representation for each geometry.

  • numpy.ndarray – Array of size M x N(N-1)/2 x 3N containing all partial derivatives of the descriptor for each geometry.

static perm(perm)[source]

Convert atom permutation to descriptor permutation.

A permutation of N atoms is converted to a permutation that acts on the corresponding descriptor representation. Applying the converted permutation to a descriptor is equivalent to permuting the atoms first and then generating the descriptor.

Parameters

perm (numpy.ndarray) – Array of size N containing the atom permutation.

Returns

Array of size N(N-1)/2 containing the corresponding descriptor permutation.

Return type

numpy.ndarray

vec_dot_d_desc(R_d_desc, vecs, out=None)[source]
sgdml.utils.desc._from_r(r, lat_and_inv=None)[source]

Generate descriptor and its Jacobian for one molecular geometry in Cartesian coordinates.

Parameters
  • r (numpy.ndarray) – Array of size 3N containing the Cartesian coordinates of each atom.

  • lat_and_inv (tuple of numpy.ndarray, optional) – Tuple of 3 x 3 matrix containing lattice vectors as columns and its inverse.

Returns

  • numpy.ndarray – Descriptor representation as 1D array of size N(N-1)/2

  • numpy.ndarray – Array of size N(N-1)/2 x 3N containing all partial derivatives of the descriptor.

sgdml.utils.desc._pbc_diff(diffs, lat_and_inv, use_torch=False)[source]

Clamp differences of vectors to super cell.

Parameters
  • diffs (numpy.ndarray) – N x 3 matrix of N pairwise differences between vectors u - v

  • lat_and_inv (tuple of numpy.ndarray) – Tuple of 3 x 3 matrix containing lattice vectors as columns and its inverse.

  • use_torch (boolean, optional) – Enable, if the inputs are PyTorch objects.

Returns

N x 3 matrix clamped differences

Return type

numpy.ndarray

sgdml.utils.desc._pdist(r, lat_and_inv=None)[source]

Compute pairwise Euclidean distance matrix between all atoms.

Parameters
  • r (numpy.ndarray) – Array of size 3N containing the Cartesian coordinates of each atom.

  • lat_and_inv (tuple of numpy.ndarray, optional) – Tuple of 3x3 matrix containing lattice vectors as columns and its inverse.

Returns

Array of size N(N-1)/2 containing the upper triangle of the pairwise distance matrix between atoms.

Return type

numpy.ndarray

sgdml.utils.desc._r_to_d_desc(r, pdist, lat_and_inv=None)[source]

Generate descriptor Jacobian for a set of atom positions in Cartesian coordinates.

This method can apply the minimum-image convention as periodic boundary condition for distances between atoms, given the lattice vectors.

Parameters
  • r (numpy.ndarray) – Array of size 3N containing the Cartesian coordinates of each atom.

  • pdist (numpy.ndarray) – Array of size N x N containing the Euclidean distance (2-norm) for each pair of atoms.

  • lat_and_inv (tuple of numpy.ndarray, optional) – Tuple of 3 x 3 matrix containing lattice vectors as columns and its inverse.

Returns

Array of size N(N-1)/2 x 3N containing all partial derivatives of the descriptor.

Return type

numpy.ndarray

sgdml.utils.desc._r_to_desc(r, pdist)[source]

Generate descriptor for a set of atom positions in Cartesian coordinates.

Parameters
  • r (numpy.ndarray) – Array of size 3N containing the Cartesian coordinates of each atom.

  • pdist (numpy.ndarray) – Array of size N x N containing the Euclidean distance (2-norm) for each pair of atoms.

Returns

Descriptor representation as 1D array of size N(N-1)/2

Return type

numpy.ndarray

sgdml.utils.io module

sgdml.utils.io.dataset_md5(dataset)[source]
sgdml.utils.io.filter_file_type(dir, type, md5_match=None)[source]

Filters all files from a directory that match a given type and (optionally) a given fingerprint.

Parameters
  • arg (str) – File path.

  • type ({‘dataset’, ‘task’, ‘model’}) – Possible file types.

  • md5_match (str, optional) – Fingerprint string.

Returns

List of file names that match the specified type and fingerprint (if provided).

Return type

list of str

Raises

ArgumentTypeError – If the directory contains unreadable .npz files.

sgdml.utils.io.generate_xyz_str(r, z, e=None, f=None, lattice=None)[source]
sgdml.utils.io.is_dir_with_file_type(arg, type, or_file=False)[source]

Validate directory path and check if it contains files of the specified type.

Note

If a file path is provided, this function acts like its a directory with just one file.

Parameters
  • arg (str) – File path.

  • type ({‘dataset’, ‘task’, ‘model’}) – Possible file types.

  • or_file (bool) – If arg contains a file path, act like it’s a directory with just a single file inside.

Returns

Tuple of directory path (as provided) and a list of contained file names of the specified type.

Return type

(str, list of str)

Raises
  • ArgumentTypeError – If the provided directory path does not lead to a directory.

  • ArgumentTypeError – If directory contains unreadable files.

  • ArgumentTypeError – If directory contains no files of the specified type.

sgdml.utils.io.is_file_type(arg, type)[source]

Validate file path and check if the file is of the specified type.

Parameters
  • arg (str) – File path.

  • type ({‘dataset’, ‘task’, ‘model’}) – Possible file types.

Returns

Tuple of file path (as provided) and data stored in the file. The returned instance of NpzFile class must be closed to avoid leaking file descriptors.

Return type

(str, dict)

Raises
  • ArgumentTypeError – If the provided file path does not lead to a NpzFile.

  • ArgumentTypeError – If the file is not readable.

  • ArgumentTypeError – If the file is of wrong type.

  • ArgumentTypeError – If path/fingerprint is provided, but the path is not valid.

  • ArgumentTypeError – If fingerprint could not be resolved.

  • ArgumentTypeError – If multiple files with the same fingerprint exist.

sgdml.utils.io.is_strict_pos_int(arg)[source]

Validate strictly positive integer input.

Parameters

arg (str) – Integer as string.

Returns

Parsed integer.

Return type

int

Raises

ArgumentTypeError – If integer is not > 0.

sgdml.utils.io.is_task_dir_resumeable(train_dir, train_dataset, test_dataset, n_train, n_test, sigs, gdml)[source]

Check if a directory contains task and/or model files that match the configuration of a training process specified in the remaining arguments.

Check if the training and test datasets in each task match train_dataset and test_dataset, if the number of training and test points matches and if the choices for the kernel hyper-parameter \(\sigma\) are contained in the list. Check also, if the existing tasks/models contain symmetries and if that’s consistent with the flag gdml. This function is useful for determining if a training process can be resumed using the existing files or not.

Parameters
  • train_dir (str) – Path to training directory.

  • train_dataset (dataset) – Dataset from which training points are sampled.

  • test_dataset (test_dataset) – Dataset from which test points are sampled (may be the same as train_dataset).

  • n_train (int) – Number of training points to sample.

  • n_test (int) – Number of test points to sample.

  • sigs (list of int) – List of \(\sigma\) kernel hyper-parameter choices (usually: the hyper-parameter search grid)

  • gdml (bool) – If True, don’t include any symmetries in model (GDML), otherwise do (sGDML).

Returns

False, if any of the files in the directory do not match the training configuration.

Return type

bool

sgdml.utils.io.is_valid_file_type(arg_in)[source]

Check if file is either a valid dataset, task or model file.

Parameters

arg_in (str) – File path.

Returns

Tuple of file path (as provided) and data stored in the file. The returned instance of NpzFile class must be closed to avoid leaking file descriptors.

Return type

(str, dict)

Raises

ArgumentTypeError – If the provided file path does not point to a supported file type.

sgdml.utils.io.lattice_vec_to_par(lat)[source]
sgdml.utils.io.model_file_name(task_or_model, is_extended=False)[source]
sgdml.utils.io.parse_list_or_range(arg)[source]

Parses a string that represents either an integer or a range in the notation <start>:<step>:<stop>.

Parameters

arg (str) – Integer or range string.

Returns

Return type

int or list of int

Raises

ArgumentTypeError – If input can neither be interpreted as an integer nor a valid range.

sgdml.utils.io.read_xyz(file_path)[source]
sgdml.utils.io.task_file_name(task)[source]
sgdml.utils.io.train_dir_name(dataset, n_train, use_sym, use_E, use_E_cstr)[source]
sgdml.utils.io.write_geometry(filename, r, z, comment_str='')[source]
sgdml.utils.io.z_str_to_z(z_str)[source]
sgdml.utils.io.z_to_z_str(z)[source]

sgdml.utils.perm module

sgdml.utils.perm.bipartite_match(R, z, lat_and_inv=None, max_processes=None, callback=None)[source]
sgdml.utils.perm.complete_sym_group(perms, n_perms_max=None, disp_str='Permutation group completion', callback=None)[source]
sgdml.utils.perm.find_extra_perms(R, z, lat_and_inv=None, callback=None, max_processes=None)[source]
sgdml.utils.perm.find_frag_perms(R, z, lat_and_inv=None, callback=None, max_processes=None)[source]
sgdml.utils.perm.find_frags(r, z, lat_and_inv=None)[source]
sgdml.utils.perm.find_perms(R, z, lat_and_inv=None, callback=None, max_processes=None)[source]
sgdml.utils.perm.find_perms_in_frag(R, z, frag_idxs, lat_and_inv=None, max_processes=None)[source]
sgdml.utils.perm.find_perms_via_alignment(pts_full, frag_idxs, align_a_idxs, align_b_idxs, z, lat_and_inv=None, max_processes=None)[source]
sgdml.utils.perm.find_perms_via_reflection(r, z, frag_idxs, plane_3idxs, lat_and_inv=None, max_processes=None)[source]
sgdml.utils.perm.inv_perm(perm)[source]
sgdml.utils.perm.print_perm_colors(perm, pts, plane_3idxs=None)[source]
sgdml.utils.perm.salvage_subgroup(perms)[source]
sgdml.utils.perm.share_array(arr_np, typecode)[source]
sgdml.utils.perm.sync_perm_mat(match_perms_all, match_cost, n_atoms, callback=None)[source]
sgdml.utils.perm.to_cycles(perm)[source]

sgdml.utils.ui module

sgdml.utils.ui.callback(current, total=1, disp_str='', sec_disp_str=None, done_with_warning=False, newline_when_done=True)[source]

Print progress or toggle bar.

Example (progress): [ 45%] Task description (secondary string)

Example (toggle, not done): [ .. ] Task description (secondary string)

Example (toggle, done): [DONE] Task description (secondary string)

Parameters
  • current (int) – How many items already processed?

  • total (int, optional) – Total number of items? If there is only one item, the toggle style is used.

  • disp_str (str, optional) – Task description.

  • sec_disp_str (str, optional) – Additional string shown in gray.

  • done_with_warning (bool, optional) – Indicate that the process did not finish successfully.

  • newline_when_done (bool, optional) – Finish with a newline character once current=total (default: True)?

sgdml.utils.ui.color_str(str, fore_color=7, back_color=0, bold=False)[source]
sgdml.utils.ui.gen_lattice_str(lat)[source]
sgdml.utils.ui.gen_mat_str(mat)[source]

Converts a matrix to a multiline string such that the decimal points align in each column. Trailing zeros are replaced with spaces.

Parameters

mat (numpy.ndarray)

Returns

String representation of matrix.

Return type

str

sgdml.utils.ui.gen_memory_str(bytes)[source]
sgdml.utils.ui.gen_range_str(min, max)[source]

Generates a string that shows a minimum and maximum value, as well as the range.

Example: <min> |-- <range> --| <max>

Parameters
  • min (float) – Minimum value.

  • max (float) – Maximum value.

Returns

Return type

str

sgdml.utils.ui.indent_str(str, indent)[source]

Indents all lines of a multiline string right by a given number of characters.

Parameters
  • str (str) – Multiline string.

  • indent (int) – Number of characters added in front of each line.

Returns

Return type

str

sgdml.utils.ui.merge_col_str(col_str1, col_str2)[source]

Merges two multiline strings that represent columns in a table by concatenating each pair of lines.

Note

Both strings must have the same number of lines.

Parameters
  • col_str1 (str) – First multiline string.

  • col_str2 (str) – Second multiline string.

Returns

Return type

str

sgdml.utils.ui.print_lattice(lat=None, inset=False)[source]
sgdml.utils.ui.print_step_title(title_str, sec_title_str='', underscore=True)[source]
sgdml.utils.ui.print_two_column_str(str, sec_str='')[source]
sgdml.utils.ui.sec_callback(current, total=1, disp_str=None, sec_disp_str=None, main_callback=None, **kwargs)[source]
sgdml.utils.ui.str_plen(str)[source]

Returns printable length of string. This function can only account for invisible characters due to string styling with color_str.

Parameters

str (str) – String.

Returns

Return type

str

sgdml.utils.ui.unicode_str(s)[source]
sgdml.utils.ui.wrap_indent_str(label, str, width=93)[source]

Wraps and indents a multiline string to arrange it with the provided label in two columns. The default maximum line already accounts for the indentation due to the logging level label.

Example: <label><multiline string>

Parameters
  • label (str) – Label

  • str (str) – Multiline string.

Returns

Return type

str

sgdml.utils.ui.wrap_str(str, width=93)[source]

Wrap multiline string after a given number of characters. The default maximum line already accounts for the indentation due to the logging level label.

Parameters
  • str (str) – Multiline string.

  • width (int, optional) – Max number of characters in a line.

Returns

Return type

str

sgdml.utils.ui.yes_or_no(question)[source]

Ask for yes/no user input on a question.

Any response besides y yields a negative answer.

Parameters

question (str) – User question.