Is the target druggable? — binding pockets and known drugs#

The structure tutorial confirmed our target — EGFR — has a well-folded kinase domain. The next prioritization question is pharmacological: can it be drugged?

A defensible druggability call has two independent lines of evidence:

  1. Structure-based — does the protein have a pocket with the geometry and chemistry of a small-molecule binding site? (volume, enclosure, hydrophobicity — scored by fpocket)

  2. Knowledge-based — does the target already have known drugs or bioactive compounds? Existing chemical matter is itself the strongest druggability evidence there is.

This tutorial runs both, following the published druggability-assessment-as-target-prioritization workflow, and shows why neither line alone is enough.

What you will do

  1. Get an experimental structure with the binding site occupied

  2. Detect binding pockets (rust-fpocket)

  3. Characterize the top pocket — the descriptors that matter

  4. Score druggability against the documented cutoff

  5. Validate the pocket against EGFR’s known ATP site

  6. Look up known drugs (ChEMBL)

  7. The combined prioritization verdict

Requires pip install 'omicverse[mol]' and pip install fpocket-rs.

Rendering note. Inline 3D views below use py3Dmol, which emits an HTML/JavaScript block. JupyterLab applies a trust filter that strips scripts from any notebook it has not signed with your local key — so a notebook you cloned from the repo, or one that was executed by nbconvert, opens untrusted and the views show only the “3Dmol.js failed to load” warning even though the HTML is saved in the file. To restore the interactive 3D views, either re-execute the cells in your own kernel (auto-trusts the current session) or run jupyter trust /path/to/this/notebook.ipynb in a terminal. The mkdocs-rendered version of this tutorial is static HTML, so the views work without trust there.

0. Setup#

import omicverse as ov
import numpy as np
import pandas as pd
ov.plot_set()
🔬 Starting plot initialization...
🧬 Detecting GPU devices…
🚫 No GPU devices found (CUDA/MPS/ROCm/XPU)

   ____            _     _    __                  
  / __ \____ ___  (_)___| |  / /__  _____________ 
 / / / / __ `__ \/ / ___/ | / / _ \/ ___/ ___/ _ \ 
/ /_/ / / / / / / / /__ | |/ /  __/ /  (__  )  __/ 
\____/_/ /_/ /_/_/\___/ |___/\___/_/  /____/\___/                                              

🔖 Version: 2.2.1rc1   📚 Tutorials: https://omicverse.readthedocs.io/
✅ plot_set complete.

1. An experimental structure with the binding site occupied#

For pocket detection we deliberately use an experimental, ligand-bound (holo) structure rather than the AlphaFold model:

  • the co-crystallized ligand marks the bona fide binding site, giving us a ground truth to validate the detector against (step 5);

  • fpocket’s druggability score uses a solvent-burial term — an occupied pocket reports the buried geometry the drug actually sees.

PDB 1M17 is the EGFR kinase domain co-crystallized with erlotinib (an approved EGFR inhibitor).

s = ov.mol.fetch_structure('1M17', source='pdb', verbose=True)
s
fetched experimental structure 1M17 from RCSB PDB
MolStructure(1M17, source=pdb, 312 residues)

2. Detect binding pockets#

ov.mol.pockets runs fpocket (via the rust-fpocket backend): it builds a Voronoi tessellation of the protein, places alpha-spheres in the cavities, clusters them into pockets, and scores each one. Pockets are ranked by druggability score.

df = ov.mol.pockets(s)
print(f'{len(df)} pockets detected')
df[['pocket_id', 'rank', 'drug_score', 'volume',
    'n_alpha_spheres', 'n_residues']].head(6)
29 pockets detected
pocket_id rank drug_score volume n_alpha_spheres n_residues
0 1 1 0.336140 482.368815 49 11
1 11 2 0.116385 287.344200 27 8
2 7 3 0.043613 223.024551 23 9
3 5 4 0.028651 64.313395 16 6
4 25 5 0.027410 480.790523 40 13
5 3 6 0.013218 483.550275 60 16

3. Characterize the top pocket#

Druggability is not a single number — it is a judgement about size, shape and chemistry. The informative descriptors:

  • volume — too small cannot fit a drug-like molecule; very large is usually a shallow, open surface;

  • alpha-sphere count — a proxy for how enclosed / well-defined the cavity is;

  • hydrophobicity — drug-like binding sites are typically hydrophobic enough to make favourable contacts;

  • polarity — needed for specificity, but an over-polar pocket is hard to drug.

top = df.iloc[0]
print(f'top pocket (id {int(top.pocket_id)}):')
print(f'  volume               {top.volume:8.1f} A^3')
print(f'  alpha-spheres        {int(top.n_alpha_spheres):8d}')
print(f'  lining residues      {int(top.n_residues):8d}')
print(f'  hydrophobicity score {top.hydrophobicity_score:8.1f}')
print(f'  polarity score       {top.polarity_score:8.1f}')
top pocket (id 1):
  volume                  482.4 A^3
  alpha-spheres              49
  lining residues            11
  hydrophobicity score     36.5
  polarity score            5.0

4. Druggability score#

ov.mol.druggability reports the top pocket’s score against fpocket’s documented cutoff of 0.5:

  • > 0.5 — druggable

  • 0.2–0.5 — difficult

  • < 0.2 — undruggable

verdict = ov.mol.druggability(s)
print(f"top druggability score : {verdict['top_drug_score']:.3f}")
print(f"structure-based verdict: {verdict['verdict']}")
top druggability score : 0.336
structure-based verdict: difficult

Read this carefully. The EGFR ATP pocket scores in the difficult band — below the 0.5 druggable cutoff. Yet EGFR is one of the most successfully drugged targets in oncology.

This is a real, well-known limitation: fpocket’s score was calibrated largely on enclosed, hydrophobic cavities, and kinase ATP pockets systematically score low — they are shallow, partly solvent-exposed, and evolved to bind a polar nucleotide (ATP). A structure-based score is one line of evidence, not a verdict. Hence step 5 and step 6.

5. Validate the pocket against EGFR’s known ATP site#

A druggability call is only trustworthy if the detector actually found the real binding site. EGFR’s catalytic ATP site has well-characterized residues: the catalytic lysine, the gatekeeper threonine, the hinge methionine and the DFG aspartate. In PDB 1M17 — which uses the mature-protein numbering, offset 24 from UniProt — these are K721, T766, M769 and D831 (Stamos et al. 2002). Check whether the top-ranked pocket’s lining residues recover them.

atp_site = {721, 766, 769, 831}   # 1M17 numbering: Lys/Thr/Met/Asp
top_resids = {resid for _chain, resid in df.iloc[0]['residues']}
hit = sorted(atp_site & top_resids)
print(f'top pocket lining residues: {len(top_resids)}')
print(f'known ATP-site residues recovered: {hit}  ({len(hit)}/{len(atp_site)})')
top pocket lining residues: 11
known ATP-site residues recovered: [721, 766, 831]  (3/4)

The top-ranked pocket lines up with the catalytic ATP site — the detector found the bona fide binding pocket, so its score is about the right cavity. Visualize the detected pockets on the structure (interactive):

v = ov.mol.view(s, color_by='chain', show_pockets=True,
                width=720, height=520)
v
ov.mol.view: coloured by chain

3Dmol.js failed to load for some reason. Please check your browser console for error messages.

<py3Dmol.view at 0x7fc2ed0679a0>

6. Known drugs — the knowledge-based line of evidence#

ov.mol.known_drugs queries ChEMBL for compounds with an annotated mechanism of action against the target. For a target that already has chemical matter, this is decisive evidence.

drugs = ov.mol.known_drugs('EGFR', only_mechanism=True)
phase = pd.to_numeric(drugs['max_phase'], errors='coerce')
print(f'{len(drugs)} mechanism-of-action compounds in ChEMBL')
print(f'approved (max_phase 4): {(phase >= 4).sum()}')
drugs[['drug_name', 'max_phase', 'action_type']].head(8)
76 mechanism-of-action compounds in ChEMBL
approved (max_phase 4): 20
drug_name max_phase action_type
0 PANITUMUMAB 4.0 INHIBITOR
1 CETUXIMAB 4.0 INHIBITOR
2 ERLOTINIB HYDROCHLORIDE 4.0 INHIBITOR
3 GEFITINIB 4.0 INHIBITOR
4 LAPATINIB DITOSYLATE 4.0 INHIBITOR
5 AFATINIB DIMALEATE 4.0 INHIBITOR
6 OSIMERTINIB MESYLATE 4.0 INHIBITOR
7 NECITUMUMAB 4.0 INHIBITOR

Erlotinib — the very ligand bound in our structure — gefitinib, afatinib, osimertinib and the antibody therapeutics all appear. EGFR is, by the knowledge-based criterion, unambiguously druggable.

7. The prioritization verdict#

Line of evidence

Result for EGFR

Structure-based (fpocket)

ATP pocket score ≈ 0.34 → difficult

Pocket validation

top pocket recovers the catalytic ATP site ✓

Knowledge-based (ChEMBL)

dozens of inhibitors, several approved drugs

The two lines disagree — and that disagreement is the lesson. The structure-based score under-rates the kinase ATP site (a documented bias); the knowledge-based line proves druggability beyond doubt. The honest verdict: pursue — EGFR is druggable, the structure-based “difficult” score is the known kinase caveat, not a red flag.

For a novel target with no known drugs, you would not have line 2 — and you would then weigh the structure-based score knowing it under-rates kinases and over-rates large shallow surfaces.

Interpretation#

  • Druggability is a size / shape / hydrophobicity judgement, scored by a trained model with a documented cutoff.

  • A structure-based score must be validated against the bona fide pocket — did the detector even find the real site? — and is one line of evidence, not a verdict.

  • Kinase ATP pockets score low on fpocket; always cross-check with known chemical matter (known_drugs).

  • The robust call combines both lines.

Next: the docking tutorial takes this validated ATP pocket and docks a candidate molecule into it.