Molecular docking — validate the protocol, then dock the candidate#
The druggability tutorial established that EGFR’s ATP pocket is a real, drug-binding site. The final question: can a candidate small molecule bind it, and how strongly?
The single discipline that separates rigorous docking from naïve docking is redocking validation. Before trusting a docking score on a new molecule, you re-dock a known ligand — one whose binding pose was solved crystallographically — into its own receptor, and check that the protocol reproduces the native pose. Only a validated protocol is trusted.
This tutorial follows that redock-to-validate-then-dock protocol, on EGFR.
What you will do
Prepare the receptor (and understand why each step matters)
Redocking validation — re-dock erlotinib into its own structure
Inspect the redocked poses — what validation actually tells you
Dock a candidate molecule into the validated pocket
Rank candidates by predicted affinity — and its honest caveats
Requires
pip install 'omicverse[mol-dock]'. Docking is stochastic — every step fixes aseedfor reproducibility.
Rendering note. Inline 3D views below use py3Dmol, which emits an HTML/JavaScript block. JupyterLab applies a trust filter that strips scripts from any notebook it has not signed with your local key — so a notebook you cloned from the repo, or one that was executed by
nbconvert, opens untrusted and the views show only the “3Dmol.js failed to load” warning even though the HTML is saved in the file. To restore the interactive 3D views, either re-execute the cells in your own kernel (auto-trusts the current session) or runjupyter trust /path/to/this/notebook.ipynbin a terminal. The mkdocs-rendered version of this tutorial is static HTML, so the views work without trust there.
0. Setup#
import omicverse as ov
import numpy as np
import pandas as pd
ov.plot_set()
🔬 Starting plot initialization...
🧬 Detecting GPU devices…
🚫 No GPU devices found (CUDA/MPS/ROCm/XPU)
____ _ _ __
/ __ \____ ___ (_)___| | / /__ _____________
/ / / / __ `__ \/ / ___/ | / / _ \/ ___/ ___/ _ \
/ /_/ / / / / / / / /__ | |/ / __/ / (__ ) __/
\____/_/ /_/ /_/_/\___/ |___/\___/_/ /____/\___/
🔖 Version: 2.2.1rc1 📚 Tutorials: https://omicverse.readthedocs.io/
✅ plot_set complete.
1. The receptor#
We dock into PDB 1M17 — the EGFR kinase domain with erlotinib bound, the same structure validated in the druggability tutorial.
Docking needs the receptor and ligand in pdbqt format (atom types +
partial charges). ov.mol handles this internally, but it is worth knowing
what happens:
Receptor — waters and the bound ligand are removed, hydrogens and Gasteiger charges added, the protein written to pdbqt. Residues with missing atoms (common in crystal structures) are repaired or dropped.
Ligand — a SMILES string is given a 3D conformer (RDKit ETKDG), then meeko assigns rotatable bonds and writes pdbqt.
Each step matters: a wrong protonation state or a missed rotatable bond changes the docking result.
s = ov.mol.fetch_structure('1M17', source='pdb', verbose=True)
s
fetched experimental structure 1M17 from RCSB PDB
MolStructure(1M17, source=pdb, 312 residues)
2. Redocking validation — the mandatory step#
redock_validate extracts the co-crystallized ligand (erlotinib), re-docks it
into its own receptor, and compares every docked pose against the crystal
pose. It reports two distinct criteria:
passed— the top-scored pose is within 2.0 Å of the crystal pose. This validates search and scoring together — the strict criterion.sampling_passed— some generated pose is within 2.0 Å. This validates that the search reproduces the native binding mode — independently of whether Vina’s score ranks it first.
The field-accepted success threshold is RMSD < 2.0 Å.
val = ov.mol.redock_validate(s, exhaustiveness=32, seed=1)
print(f"reference ligand : {val['ref_ligand']} (erlotinib)")
print(f"top-pose RMSD : {val['rmsd']:.2f} A -> passed = {val['passed']}")
print(f"best-pose RMSD : {val['min_rmsd']:.2f} A (pose rank {val['min_rmsd_rank']})")
print(f" -> sampling_passed = {val['sampling_passed']}")
reference ligand : AQ4 (erlotinib)
top-pose RMSD : 7.95 A -> passed = False
best-pose RMSD : 1.42 A (pose rank 3)
-> sampling_passed = True
Read the two criteria. The docking reproduces the native binding
mode — a pose within ~1.5 Å of the crystal structure is generated, so
sampling_passed is True: the search box and parameters are correct.
But that near-native pose is not ranked first — Vina’s scoring function places a different pose at rank 1. This is the single most important, well-documented limitation of docking: the lowest-RMSD pose does not always score best. Vina’s scoring function is semi-empirical.
The practical consequence — and the real lesson of redocking — is: the protocol is validated for sampling, so when docking a new molecule you must inspect the top-ranked poses, not blindly trust pose 1.
3. Inspect the redocked poses#
Lay the per-pose RMSD next to the per-pose Vina affinity. This makes the scoring-vs-RMSD mismatch concrete.
res = val['result']
poses = pd.DataFrame({'rank': range(1, len(res.affinities) + 1),
'affinity_kcal_mol': np.round(res.affinities, 2),
'rmsd_to_crystal_A': np.round(val['all_rmsd'], 2)})
poses
| rank | affinity_kcal_mol | rmsd_to_crystal_A | |
|---|---|---|---|
| 0 | 1 | -7.31 | 7.95 |
| 1 | 2 | -7.23 | 7.87 |
| 2 | 3 | -7.12 | 1.42 |
| 3 | 4 | -7.11 | 8.56 |
| 4 | 5 | -7.06 | 7.94 |
| 5 | 6 | -7.04 | 9.40 |
| 6 | 7 | -7.04 | 8.70 |
| 7 | 8 | -7.02 | 8.50 |
| 8 | 9 | -6.80 | 2.58 |
View the near-native pose — the one redocking shows is essentially the crystallographic binding mode. The grey box is the Vina search box; the green sticks are erlotinib.
near_native = val['min_rmsd_rank'] - 1
v = ov.mol.view_docking(s, res, pose=near_native, width=720, height=520)
v
3Dmol.js failed to load for some reason. Please check your browser console for error messages.
<py3Dmol.view at 0x7f413740c940>
4. Dock a candidate molecule#
With the protocol validated, dock a candidate into the same pocket. We use gefitinib — another EGFR inhibitor — reusing the validated search box from the redocking run.
The ligand is given by name; ov.mol resolves it to a structure via ChEMBL.
box = res.box
cand = ov.mol.dock(s, 'gefitinib', box=box, exhaustiveness=32, seed=1)
cand
DockingResult(9 poses, best -8.06 kcal/mol)
print(f'gefitinib — {len(cand.affinities)} poses')
print(f'best Vina affinity: {cand.affinities[0]:.2f} kcal/mol')
print(f'affinity spread : {cand.affinities.min():.2f} to {cand.affinities.max():.2f}')
gefitinib — 9 poses
best Vina affinity: -8.06 kcal/mol
affinity spread : -8.06 to -7.12
Inspect the top-ranked gefitinib pose in the pocket (interactive):
v = ov.mol.view_docking(s, cand, pose=0, width=720, height=520)
v
3Dmol.js failed to load for some reason. Please check your browser console for error messages.
<py3Dmol.view at 0x7f412f76c640>
5. Working with local molecules — your own files, and saving poses#
dock() was demonstrated above with a drug name (resolved through ChEMBL).
In a real campaign your candidate is more likely to be a structure file from a
vendor catalog, an in-house design pipeline, or a virtual-screening hit list.
ov.mol.dock(structure, ligand) accepts the ligand as any of:
Input form |
Example |
|---|---|
SMILES string |
|
Drug name (ChEMBL) |
|
Local file |
|
RDKit |
|
|
|
Build a small SDF on disk (standing in for a compound from your own pipeline) and dock it straight from the file path.
# 1. Write a local SDF — stand-in for a compound from disk
from rdkit import Chem
from rdkit.Chem import AllChem
gefitinib_smi = 'COc1cc2ncnc(Nc3ccc(F)c(Cl)c3)c2cc1OCCCN1CCOCC1'
mol = Chem.AddHs(Chem.MolFromSmiles(gefitinib_smi))
AllChem.EmbedMolecule(mol, randomSeed=1)
Chem.SDWriter('my_compound.sdf').write(mol)
print('wrote my_compound.sdf')
wrote my_compound.sdf
# 2. Dock straight from the local file path — same dock() call, file in
local_result = ov.mol.dock(s, 'my_compound.sdf', box=box,
exhaustiveness=24, seed=1)
print('best affinity from local SDF:',
round(float(local_result.affinities[0]), 2), 'kcal/mol')
best affinity from local SDF: -8.06 kcal/mol
DockingResult carries every pose as both an RDKit molecule and a PDB block,
but those live in memory. Persist them to disk for downstream analysis with
save_poses / save_pose — .sdf is the recommended format
(preserves bond orders and tags each pose with its vina_affinity_kcal_mol),
.pdb is the universal alternative.
# 3. Save the docked poses to disk
local_result.save_poses('docked_poses.sdf') # all poses, SDF + tags
local_result.save_pose('best_pose.pdb', pose=0) # top-scored pose only
print('saved docked_poses.sdf (', len(local_result.poses), 'poses )')
print('saved best_pose.pdb ( pose 0,',
f"{float(local_result.affinities[0]):.2f} kcal/mol )")
saved docked_poses.sdf ( 9 poses )
saved best_pose.pdb ( pose 0, -8.06 kcal/mol )
6. Rank candidates by predicted affinity#
A docking campaign compares molecules under identical settings. Dock the EGFR inhibitor against a non-kinase drug — ibuprofen, an NSAID with no EGFR activity — and compare best affinities.
candidates = ['gefitinib', 'erlotinib', 'ibuprofen']
scores = {name: ov.mol.dock(s, name, box=box, exhaustiveness=16,
seed=1).affinities[0]
for name in candidates}
for name, aff in sorted(scores.items(), key=lambda kv: kv[1]):
print(f'{name:12s}: {aff:6.2f} kcal/mol')
gefitinib : -8.06 kcal/mol
erlotinib : -7.29 kcal/mol
ibuprofen : -6.54 kcal/mol
The two EGFR inhibitors score more favourably (more negative) than ibuprofen — the campaign separates real binders from a non-binder, which is what docking is good at.
The honest caveats, which you must carry forward:
A Vina affinity is semi-empirical — useful for relative ranking within one campaign, not an absolute binding constant. Do not read
-9 kcal/molas a measured Kd.Rankings are only comparable under identical receptor, box and settings.
As redocking showed, the top-scored pose can be wrong — always inspect the top few poses, and prefer a pose consistent with known interactions (here, an H-bond to the kinase hinge).
Interpretation#
Always redock first. A docking protocol is only trustworthy once it reproduces a known binding mode —
redock_validatemakes this a single call.Distinguish sampling (does the search find the native pose?) from scoring (does Vina rank it first?). Vina’s sampling is reliable; its scoring is semi-empirical and can mis-rank.
Docking gives relative rankings within one campaign — never absolute affinities.
view_dockingmakes every pose inspectable, inline and in the exported docs.
This completes the ov.mol tutorial suite: from an omics target to its
structure, its druggability, and a validated docking campaign.