hsr package

Submodules

hsr.fingerprint module

hsr.fingerprint.compute_distances(molecule_data: ndarray, scaling=None)[source]

Calculate the Euclidean distance between each point in molecule_data and scaled reference points.

This function computes the distances between each data point in a molecule and a set of reference points. The reference points are scaled either by a factor or by a matrix depending on the type of the ‘scaling’ parameter.

Parameters

molecule_datanp.ndarray: Data of the molecule with each row representing a point.
scalingfloat, np.ndarray: The scaling applied to the reference points.

Returns

np.ndarray: A matrix of distances, where each element [i, j] is the distance between the i-th molecule data point and the j-th reference point.

hsr.fingerprint.compute_statistics(distances)[source]

Calculate statistical moments (mean, standard deviation, skewness) for the given distances.

Parameters

distancesnp.ndarray: Matrix with distances between each point and each reference point.

Returns

list: A list of computed statistics.

hsr.fingerprint.generate_fingerprint_from_data(molecule_data: array, scaling='matrix', chirality=False)[source]

Generate a fingerprint directly from molecular data.

This function takes the data of a molecule, applies PCA transformation considering chirality if needed, and computes the fingerprint.

Parameters

molecule_datanp.array: Data of the molecule, with each row representing a point.
scalingstr, float, or np.ndarray: Specifies the scaling applied to reference points. If set to ‘matrix’ (default), a scaling matrix is automatically computed based on the PCA-transformed data. If a float is provided, it’s used as a scaling factor. If a numpy.ndarray is provided, it’s used as a scaling matrix.
chiralitybool, optional: Consider chirality in PCA transformation if set to True.

Returns

list or tuple: Fingerprint of the molecule, and dimensionality if chirality is considered.

hsr.fingerprint.generate_fingerprint_from_molecule(molecule, features={'delta_neutrons': <function extract_neutron_difference_from_common_isotope>, 'formal_charges': <function extract_formal_charge>, 'protons': <function extract_proton_number>}, scaling='matrix', chirality=False, removeHs=False)[source]

Generate a fingerprint from a molecular structure using specified features and scaling.

This function processes an RDKit molecule object to generate its fingerprint. It first converts the molecule into n-dimensional data based on the specified features, optionally removing hydrogen atoms if specified. A PCA transformation is then performed, with an option to consider chirality. The reference points for distance calculation are scaled as per the provided scaling parameter, and the fingerprint is computed based on these distances.

Parameters

moleculeRDKit Mol: RDKit molecule object.
featuresdict, optional: Features to consider for molecule conversion. Default is DEFAULT_FEATURES.
scalingstr, float, or np.ndarray: Specifies the scaling applied to reference points. If ‘matrix’, a scaling matrix is computed and applied. If a float, it is used as a scaling factor. If a numpy.ndarray, it is directly used as the scaling matrix.
chiralitybool, optional: If True, chirality is considered in the PCA transformation, which can be important for distinguishing chiral molecules.
removeHsbool, optional: If True, hydrogen atoms are removed from the molecule before conversion, focusing on heavier atoms.

Returns

list or tuple: Fingerprint of the molecule. If chirality is considered, also returns the dimensionality post-PCA transformation.

hsr.fingerprint.generate_fingerprint_from_transformed_data(molecule_data: ndarray, scaling)[source]

Compute a fingerprint from transformed molecular data.

This function generates a molecular fingerprint based on distance statistics. It calculates distances between the transformed molecular data points and a set of reference points that are scaled using the provided scaling parameter. The fingerprint is derived from these distance measurements.

Parameters

molecule_datanp.ndarray: Transformed data of the molecule, each row representing a transformed point.
scalingfloat, np.ndarray: The scaling applied to the reference points.

Returns

list: Fingerprint derived from the distance measurements to scaled reference points.

hsr.fingerprint.generate_reference_points(dimensionality, scaling=None)[source]

Generate reference points in the n-dimensional space.

Parameters

dimensionalityint: The number of dimensions.
scalingfloat, np.ndarray: The scaling applied to the reference points.

Returns

np.ndarray: An array of reference points including the centroid and the points on each axis.

hsr.pca_transform module

hsr.pca_transform.adjust_eigenvector_signs(original_data, eigenvectors, chirality=False, tolerance=1e-10)[source]

Adjust the sign of eigenvectors based on the data’s projections.

This function iterates through each eigenvector and determines its sign by examining the direction of the data’s maximum projection along that eigenvector. If the maximum projection is negative, the sign of the eigenvector is flipped. The function also handles special cases such as symmetric distributions of projections and can adjust eigenvectors based on chirality considerations.

Parameters

original_datanumpy.ndarray: N-dimensional array representing a molecule, where each row is a sample/point.
eigenvectorsnumpy.ndarray: Eigenvectors obtained from the PCA decomposition.
chiralitybool, optional: If True, the function also considers the skewness of the projections to decide on flipping the eigenvector. This is necessary for distinguishing chiral molecules. Defaults to False.
tolerancefloat, optional: Tolerance used when comparing projections. Defaults to 1e-4.

Returns

eigenvectorsnumpy.ndarray: Adjusted eigenvectors with their sign possibly flipped.
sign_changesint: The number of eigenvectors that had their signs changed.
best_eigenvector_to_flipint: Index of the eigenvector with the highest skewness, relevant when chirality is considered. This is the eigenvector most likely to be flipped to preserve chirality.

hsr.pca_transform.compute_pca_using_covariance(original_data, chirality=False, return_axes=False, print_steps=False)[source]

Perform Principal Component Analysis (PCA) using eigendecomposition of the covariance matrix.

This function conducts PCA on a given dataset to produce a consistent reference system, facilitating comparison between different molecules. It emphasizes generating eigenvectors that provide deterministic outcomes and consistent orientations. The function also includes an option to handle chiral molecules by ensuring a positive determinant for the transformation matrix.

Parameters

original_datanumpy.ndarray: An N-dimensional array representing a molecule, where each row is a sample/point. The array should have a shape (n_samples, n_features), where n_samples is the number of samples and n_features is the number of features.
chiralitybool, optional: If set to True, the function ensures that the determinant of the transformation matrix is positive, allowing for the distinction of chiral molecules. Default is False.
return_axesbool, optional: If True, returns the principal axes (eigenvectors) in addition to the transformed data. Default is False.
print_stepsbool, optional: If True, prints the steps of the PCA process: covariance matrix, eigenvalues, eigenvectors and transformed data. Default is False.

Returns

transformed_datanumpy.ndarray: The dataset after PCA transformation. This data is aligned to the principal components and is of the same shape as the original data.
dimensionalityint: The number of significant dimensions in the transformed data. Only returnd if chirality is True.
eigenvectorsnumpy.ndarray, optional: Only returned if return_axes is True. The principal axes of the transformation, represented as eigenvectors. Each column corresponds to an eigenvector.

hsr.pca_transform.extract_relevant_subspace(eigenvectors, significant_indices, tol=1e-10)[source]

Extracts the subset of eigenvectors that’s relevant for the determinant calculation.

This function prunes eigenvectors by removing rows and columns that have all zeros except for a single entry close to 1 or -1 within a given tolerance (eigenvectors with an eigenvalue equal to 0, and relative components). Then, it further reduces the matrix using the provided significant indices to give a relevant subset of eigenvectors.

Parameters

eigenvectorsnumpy.ndarray: The eigenvectors matrix to prune and reduce.
significant_indicesnumpy.ndarray: Indices of significant eigenvectors.
tolfloat, optional (default = 1e-10): Tolerance for determining whether a value is close to 0, 1, or -1.

Returns

numpy.ndarray: The determinant-relevant subset of eigenvectors.

hsr.pre_processing module

hsr.pre_processing.load_molecules_from_sdf(path, removeHs=False, sanitize=False)[source]

Load a list of molecules from an SDF file.

Parameters

pathstr: Path to the SDF file.
removeHsbool, optional: Whether to remove hydrogens. Defaults to False.
sanitizebool, optional: Whether to sanitize the molecules. Defaults to False.

Returns

list of rdkit.Chem.rdchem.Mol: A list of RDKit molecule objects.

hsr.pre_processing.molecule_to_ndarray(molecule, features={'delta_neutrons': <function extract_neutron_difference_from_common_isotope>, 'formal_charges': <function extract_formal_charge>, 'protons': <function extract_proton_number>}, removeHs=False)[source]

Generate a numpy array representing the given molecule in N dimensions.

This function converts a molecule into an N-dimensional numpy array based on specified features. Each feature is computed using a function defined in the ‘features’ dictionary.

Parameters

moleculerdkit.Chem.rdchem.Mol: The input RDKit molecule object.
featuresdict[str, callable], optional: A dictionary where each key is a feature name (str) and the value is a callable function to compute that feature. The function takes an RDKit atom object as input and returns a feature value (a numeric type). Defaults to DEFAULT_FEATURES.
removeHs: bool, optional: If True, hydrogen atoms will not be included in the array representation. Defaults to False.

Returns

numpy.ndarray: Array with shape (number of atoms, 3 spatial coordinates + number of features), representing the molecule.

hsr.pre_processing.read_mol_from_file(path, removeHs=False, sanitize=False)[source]

General reader for molecules from files.

Parameters

pathstr: Path to the file.
removeHsbool, optional: Whether to remove hydrogens. Defaults to False.
sanitizebool, optional: Whether to sanitize the molecules. Defaults to False.

Returns

rdkit.Chem.rdchem.Mol: A RDKit molecule object.

hsr.similarity module

hsr.similarity.calculate_manhattan_distance(moments1: list, moments2: list)[source]

Calculate the manhattan distance between two lists.

Parameters

moments1list: The first list of numerical values.
moments2list: The second list of numerical values, must be of the same length as moments1.

Returns

float: The mean absolute difference between the two lists.

hsr.similarity.calculate_similarity_from_distance(distance, n_components)[source]

Calculate similarity score from a distance score.

This function converts a distance score into a similarity score using a reciprocal function. The distance is first normalized by the number of components of the fingerprint. The similarity score approaches 1 as the difference score approaches 0, and it approaches 0 as the difference score increases.

Parameters

partial_scorefloat: The difference score, a non-negative number.
n_componentsint: The number of components in the fingerprint.

Returns

float: The similarity score derived from the distance.

hsr.similarity.compute_distance(mol1, mol2, features={'delta_neutrons': <function extract_neutron_difference_from_common_isotope>, 'formal_charges': <function extract_formal_charge>, 'protons': <function extract_proton_number>}, scaling='matrix', removeHs=False, chirality=False)[source]

Calculate the distance score between two molecules using their n-dimensional fingerprints.

This function generates fingerprints for two molecules based on their structures and a set of features, and then computes a distance score between these fingerprints.

Parameters

mol1RDKit Mol: The first RDKit molecule object.
mol2RDKit Mol: The second RDKit molecule object.
featuresdict, optional: Dictionary of features to be considered. Default is DEFAULT_FEATURES.
scalingstr, float, or np.ndarray: Specifies the scaling applied to reference points. If set to ‘matrix’ (default), a scaling matrix is automatically computed based on the PCA-transformed data. If a float is provided, it’s used as a scaling factor. If a numpy.ndarray is provided, it’s used as a scaling matrix.
removeHsbool, optional: If True, hydrogen atoms are removed from the molecule before generating the fingerprint.
chiralitybool, optional: Consider chirality in the generation of fingerprints if set to True.

Returns

float: The computed distance score between the two molecules.

hsr.similarity.compute_distance_from_ndarray(mol1_nd: array, mol2_nd: array, scaling='matrix', chirality=False)[source]

Calculate the distance score between two molecules represented as N-dimensional arrays.

This function computes fingerprints for two molecules based on their N-dimensional array representations and then calculates a distance score between these fingerprints.

Parameters

mol1_ndnumpy.ndarray: The N-dimensional array representing the first molecule.
mol2_ndnumpy.ndarray: The N-dimensional array representing the second molecule.
scalingstr, float, or np.ndarray: Specifies the scaling applied to reference points. If set to ‘matrix’ (default), a scaling matrix is automatically computed based on the PCA-transformed data. If a float is provided, it’s used as a scaling factor. If a numpy.ndarray is provided, it’s used as a scaling matrix.
chiralitybool, optional: Consider chirality in the generation of fingerprints if set to True.

Returns

float: The computed distance score between the two molecules.

hsr.similarity.compute_similarity(mol1, mol2, features={'delta_neutrons': <function extract_neutron_difference_from_common_isotope>, 'formal_charges': <function extract_formal_charge>, 'protons': <function extract_proton_number>}, scaling='matrix', removeHs=False, chirality=False)[source]

Calculate the similarity score between two molecules using their n-dimensional fingerprints.

This function generates fingerprints for two molecules based on their structures and a set of features, and then computes a similarity score between these fingerprints.

Parameters

mol1RDKit Mol: The first RDKit molecule object.
mol2RDKit Mol: The second RDKit molecule object.
featuresdict, optional: Dictionary of features to be considered. Default is DEFAULT_FEATURES.
scalingstr, float, or np.ndarray: Specifies the scaling applied to reference points. If set to ‘matrix’ (default), a scaling matrix is automatically computed based on the PCA-transformed data. If a float is provided, it’s used as a scaling factor. If a numpy.ndarray is provided, it’s used as a scaling matrix.
removeHsbool, optional: If True, hydrogen atoms are removed from the molecule before generating the fingerprint.
chiralitybool, optional: Consider chirality in the generation of fingerprints if set to True.

Returns

float: The computed similarity score between the two molecules.

hsr.similarity.compute_similarity_from_ndarray(mol1_nd: array, mol2_nd: array, scaling='matrix', chirality=False)[source]

Calculate the similarity score between two molecules represented as N-dimensional arrays.

This function computes fingerprints for two molecules based on their N-dimensional array representations and then calculates a similarity score between these fingerprints.

Parameters

mol1_ndnumpy.ndarray: The N-dimensional array representing the first molecule.
mol2_ndnumpy.ndarray: The N-dimensional array representing the second molecule.
scalingstr, float, or np.ndarray: Specifies the scaling applied to reference points. If set to ‘matrix’ (default), a scaling matrix is automatically computed based on the PCA-transformed data. If a float is provided, it’s used as a scaling factor. If a numpy.ndarray is provided, it’s used as a scaling matrix.
chiralitybool, optional: Consider chirality in the generation of fingerprints if set to True.

Returns

float: The computed similarity score between the two molecules.

hsr.similarity.compute_similarity_score(fingerprint_1: list, fingerprint_2: list)[source]

Calculate the similarity score between two fingerprints.

Parameters

fingerprint_1list: The fingerprint of the first molecule.
fingerprint_2list: The fingerprint of the second molecule.

Returns

float: The computed similarity score.

hsr.utils module

hsr.utils.compute_scaling_factor(molecule_data)[source]: Computes the largest distance between the centroid and the molecule data points

hsr.utils.compute_scaling_matrix(molecule_data)[source]: Computes a diagonal scaling matrix with the maximum absolute values for each dimension of the molecule data as its diagonal entries

hsr.utils.extract_formal_charge(atom)[source]

hsr.utils.extract_neutron_difference_from_common_isotope(atom)[source]

hsr.utils.extract_proton_number(atom)[source]

hsr.utils.formal_charge(atom)[source]

hsr.utils.neutron_difference(atom)[source]

hsr.utils.proton_number(atom)[source]

hsr package

Submodules

hsr.fingerprint module

Parameters

Returns

Parameters

Returns

Parameters

Returns

Parameters

Returns

Parameters

Returns

Parameters

Returns

hsr.pca_transform module

Parameters

Returns

Parameters

Returns

Parameters

Returns

hsr.pre_processing module

Parameters

Returns

Parameters

Returns

Parameters

Returns

hsr.similarity module

Parameters

Returns

Parameters

Returns

Parameters

Returns

Parameters

Returns

Parameters

Returns

Parameters

Returns

Parameters

Returns

hsr.utils module

Module contents