Tutorial 2.0: Evolutionary Design of trans-Pt(X)(X’)(L)(CO)
Introduction
In this tutorial we play with the genetic algorithm while designing Pt compounds. To allow a chemical interpretation of the results, we define a concrete chemical design goal: identify a set of ligands [X, X’, L], where each X is a covalent ligand and L is a neutral donor (dative) ligand, that weaken the C≡O bond of the carbonyl ligand in the square planar complex trans-Pt(X)(X’)(L)(CO).
The variable strength of the CO bond results from the electronic properties of the metal fragment trans-Pt(X)(X’)(L). The bonding between the metal and CO involves electron donation from the carbonyl carbon atom to the metal, and back-donation from an occupied d-orbital of the metal center to the CO π-antibonding orbital. The accompanying weakening of the CO bond is reflected in bond elongation and red-shift of the corresponding stretching frequency. This effect is the basis for the Tolman electronic parameter, which is often used to classify ligands according to their electronic properties.
Fitness
The fitness associated to each set [X, X’, L] is defined by the length of the C≡O bond of the carbonyl ligand in the square planar complex trans-Pt(X)(X’)(L)(CO) as provided by the following molecular modelling protocol:
assembling of 3D building blocks to generate an initial molecular model.
light-weight conformational search performed by Tinker in the torsional space (bond lengths and angles are not changed).
geometry optimization by semi-empirical method PM6 as implemented in Spartan.
Since we want to run several experiments in very little time, this tutorial is designed to avoid the time-consuming molecular modelling part needed to obtain the value of the fitness. In fact, the fitness value for all the candidates that can be generated by the building block space has been preliminary computed and are saved in the downloaded dataset.
Therefore, the fitness provider (i.e., the Python script named fitness_provider_fromDB.py
) is only searching for the fitness value for a given Pt complex in the list of pre-computed fitness values.
Instructions
Start DENOPTIM from within the
tutorial_2.0
folder. This is done from the Terminal (macOS/Linux) or the Anaconda prompt (Windows):
cd your_path_to_tutorial_2.0
denoptim input_parameters
Inspect the parameters:
In the
Genetic Algorithm
tab, weights of mutation and crossover to 0: the algorithm will do neither crossover nor mutation. Instead, the weight of construction from scratch is 1. This means that all new candidates will be built randomly from scratch (we will refer to this as “construction-only” experiment). Also, the experiment will use an initial population, which you can look at by opening theinitPopulation.sdf
file. Note that these are complexes with short CO bond (i.e., low fitness).The
Fitness Provider
tab configures the call to the external python script that “calculates” the fitness.In the
Space of Building Blocks
tab, you find the names of the files collecting the building blocks and the APClass compatibility rules.Do
File
->Open
to inspect the scaffold fragment atlib_scaffolds.sdf
file.Click on
File
->Open
and inspect thecompatibility_matrix.par
file.Click on
File
->Open
and inspect thelib_fragments.sdf
file. Look for the fragments that offer attachment points belonging to the APClasses you have identified in the previous step.
Go back to the input parameters by clicking on
Active Tabs
->Prepare GA experiment
and start the evolutionary design by clicking onRun now...
and follow the dialog: Once the experiment is submitted, you will be notified on where the output is being written.NOTE: as seen in the previous tutorial, the bar in the top-right part of DENOPTIM's window turns grey to indicate the experiment is running. When it turns blue again, the experiment has been completed.When the experiment has been completed, open the output from
File
->Open Recent...
and select the appropriate path. This opens a GARun Inspector tab where you find:The evolution plot (top-right panel): each point is a a candidate, click on it to display the structure and properties of the candidate. By default, the plot show two blue lines: the minimum and the maximum value of the fitness in the population. The button
Show/Hide Population Stats
allows to add also the mean and median.The monitor plot (bottom-right panel): collects numerical indicators of the algorithm behaviour, such as the number of attempts to create candidates, which is the series shown by default. The button
Show/Hide Population Stats
allows to add/remove series to the plot.
NOTE: Plots can be saved by right-clicking on them and choosingSave As...
. Similarly, you can save pictures of molecular models by right-clicking on them and choosingFile
->Save
->Save As PNG
.Run two more independent experiments starting from the same input parameters. By default, each experiment uses an independent sequence of pseudo-random events. Therefore, to get independent repeats you can switch back to the input parameters by clicking on
Active Tabs
->Prepare GA experiment
, and submit again withRun now...
. You can submit more than one experiment in parallel.Discussion Point: if we exclude generation 0 (i.e., the initial population given as input), the distribution of fitness values over the course of the experiment appears random for all the experiments run so far. Try to explain why (Hint: remember what we noted when inspecting the input parameters in point 2)Now we produce another set of GA experiments where we change the way the software is allowed to generate new candidates. Via
Active Tabs
->Prepare GA experiment
go back to the input parameters and do the following in theGenetic Algorithm
tab:set Crossover weight = 1
set Construction weight = 0
Submit three such “crossover-only” experiments via the
Run now...
button.Inspect the results of these “crossover-only” experiments.
Discussion Point: the distribution of fitness values over the course of the experiment is radically different from the "construction-only" experiments. Try to explain why (Hint: we are comparing experiments where new candidates are generated using either only crossover or only construction from scratch).Now, we produce “mutation-only” experiments. Again, via
Active Tabs
->Prepare GA experiment
go back to the input parameters and set the following in theGenetic Algorithm
tab:Crossover weight = 0
Mutation weight = 1
Construction weight = 0
As before, run three such experiments.
Inspect the results of these “mutation-only” experiments.
Discussion Point: again the distribution of fitness values over the course of the experiment is radically different from the "construction-only" experiments, and it is also different from the "crossover-only" experiments. Try to explain why (Hint: compare mutation and crossover in terms of how much structural diversity each of such operation can bring into the population).Finally, we combine crossover, mutation and random construction (we’ll call these the “complete GA” experiments). Again, via
Active Tabs
->Prepare GA experiment
go back to the input parameters and set the following in theGenetic Algorithm
tab:Crossover weight = 1
Mutation weight = 1
Construction weight = 1
Inspect the results of these runs as well. In particular, chose an experiment that produced a population with a high mean fitness and, from the GARun Inspector, click on
Open Population Graphs
to visualize the molecules in the population at a late stage of the experiment (high generation number).