Tutorial 2.0: Evolutionary Design of trans-Pt(X)(X’)(L)(CO)

Introduction

In this tutorial we play with the genetic algorithm while designing Pt compounds. To allow a chemical interpretation of the results, we define a concrete chemical design goal: identify a set of ligands [X, X’, L], where each X is a covalent ligand and L is a neutral donor (dative) ligand, that weaken the C≡O bond of the carbonyl ligand in the square planar complex trans-Pt(X)(X’)(L)(CO).

The variable strength of the CO bond results from the electronic properties of the metal fragment trans-Pt(X)(X’)(L). The bonding between the metal and CO involves electron donation from the carbonyl carbon atom to the metal, and back-donation from an occupied d-orbital of the metal center to the CO π-antibonding orbital. The accompanying weakening of the CO bond is reflected in bond elongation and red-shift of the corresponding stretching frequency. This effect is the basis for the Tolman electronic parameter, which is often used to classify ligands according to their electronic properties.

Fitness

The fitness associated to each set [X, X’, L] is defined by the length of the C≡O bond of the carbonyl ligand in the square planar complex trans-Pt(X)(X’)(L)(CO) as provided by the following molecular modelling protocol:

assembling of 3D building blocks to generate an initial molecular model.
light-weight conformational search performed by Tinker in the torsional space (bond lengths and angles are not changed).
geometry optimization by semi-empirical method PM6 as implemented in Spartan.

Since we want to run several experiments in very little time, this tutorial is designed to avoid the time-consuming molecular modelling part needed to obtain the value of the fitness. In fact, the fitness value for all the candidates that can be generated by the building block space has been preliminary computed and are saved in the downloaded dataset.

Therefore, the fitness provider (i.e., the Python script named fitness_provider_fromDB.py) is only searching for the fitness value for a given Pt complex in the list of pre-computed fitness values.

Instructions

Start DENOPTIM from within the tutorial_2.0 folder. This is done from the Terminal (macOS/Linux) or the Anaconda prompt (Windows):

cd your_path_to_tutorial_2.0
denoptim input_parameters

Inspect the parameters:
- In the Genetic Algorithm tab, weights of mutation and crossover to 0: the algorithm will do neither crossover nor mutation. Instead, the weight of construction from scratch is 1. This means that all new candidates will be built randomly from scratch (we will refer to this as “construction-only” experiment). Also, the experiment will use an initial population, which you can look at by opening the initPopulation.sdf file. Note that these are complexes with short CO bond (i.e., low fitness).
- The Fitness Provider tab configures the call to the external python script that “calculates” the fitness.
- In the Space of Building Blocks tab, you find the names of the files collecting the building blocks and the APClass compatibility rules.
- Do File->Open to inspect the scaffold fragment at lib_scaffolds.sdf file.
- Click on File->Open and inspect the compatibility_matrix.par file.
- Click on File->Open and inspect the lib_fragments.sdf file. Look for the fragments that offer attachment points belonging to the APClasses you have identified in the previous step.
Go back to the input parameters by clicking on Active Tabs -> Prepare GA experiment and start the evolutionary design by clicking on Run now... and follow the dialog: Once the experiment is submitted, you will be notified on where the output is being written.

NOTE: as seen in the previous tutorial, the bar in the top-right part of DENOPTIM's window turns grey to indicate the experiment is running. When it turns blue again, the experiment has been completed.
When the experiment has been completed, open the output from File->Open Recent... and select the appropriate path. This opens a GARun Inspector tab where you find:
- The evolution plot (top-right panel): each point is a a candidate, click on it to display the structure and properties of the candidate. By default, the plot show two blue lines: the minimum and the maximum value of the fitness in the population. The button Show/Hide Population Stats allows to add also the mean and median.
- The monitor plot (bottom-right panel): collects numerical indicators of the algorithm behaviour, such as the number of attempts to create candidates, which is the series shown by default. The button Show/Hide Population Stats allows to add/remove series to the plot.
NOTE: Plots can be saved by right-clicking on them and choosing Save As.... Similarly, you can save pictures of molecular models by right-clicking on them and choosing File->Save->Save As PNG.
Run two more independent experiments starting from the same input parameters. By default, each experiment uses an independent sequence of pseudo-random events. Therefore, to get independent repeats you can switch back to the input parameters by clicking on Active Tabs -> Prepare GA experiment, and submit again with Run now.... You can submit more than one experiment in parallel.

Discussion Point: if we exclude generation 0 (i.e., the initial population given as input), the distribution of fitness values over the course of the experiment appears random for all the experiments run so far. Try to explain why (Hint: remember what we noted when inspecting the input parameters in point 2)
Now we produce another set of GA experiments where we change the way the software is allowed to generate new candidates. Via Active Tabs -> Prepare GA experiment go back to the input parameters and do the following in the Genetic Algorithm tab:
- set Crossover weight = 1
- set Construction weight = 0
Submit three such “crossover-only” experiments via the Run now... button.
Inspect the results of these “crossover-only” experiments.

Discussion Point: the distribution of fitness values over the course of the experiment is radically different from the "construction-only" experiments. Try to explain why (Hint: we are comparing experiments where new candidates are generated using either only crossover or only construction from scratch).
Now, we produce “mutation-only” experiments. Again, via Active Tabs -> Prepare GA experiment go back to the input parameters and set the following in the Genetic Algorithm tab:
- Crossover weight = 0
- Mutation weight = 1
- Construction weight = 0
As before, run three such experiments.
Inspect the results of these “mutation-only” experiments.

Discussion Point: again the distribution of fitness values over the course of the experiment is radically different from the "construction-only" experiments, and it is also different from the "crossover-only" experiments. Try to explain why (Hint: compare mutation and crossover in terms of how much structural diversity each of such operation can bring into the population).
Finally, we combine crossover, mutation and random construction (we’ll call these the “complete GA” experiments). Again, via Active Tabs -> Prepare GA experiment go back to the input parameters and set the following in the Genetic Algorithm tab:
- Crossover weight = 1
- Mutation weight = 1
- Construction weight = 1
Inspect the results of these runs as well. In particular, chose an experiment that produced a population with a high mean fitness and, from the GARun Inspector, click on Open Population Graphs to visualize the molecules in the population at a late stage of the experiment (high generation number).

Discussion Point: calculate the number of candidates (i.e., #offspring * #generations) that were visited before finding at least 3 candidate with fitness higher then 1.151 Å. For experiments that never collect that many good candidates, consider the total number of candidates visited as the lower bound value. Comparing these values against the total number candidates that can be generated from the space of building blocks, i.e., 10332 candidates, gives you an idea of the efficiency of each type of experiment.