Using AlphaFold2 Models in Structure-Based Drug Discovery Workflows

AlphaFold2 protein structure prediction model used as input for virtual screening

AlphaFold2 outputs are now routinely used as docking targets in early discovery programs. But "good enough for docking" is not the same as "use it out of the box." In our experience working with seed biotechs on structure-based screening campaigns, roughly 40% of initial AF2-derived models require at least one preparation step before they produce reliable docking scores. The other 60%? They still need pLDDT inspection. Every time.

When an AF2 Structure Is Actually Ready for Docking

The short answer: when the binding site is well-ordered and the pLDDT confidence scores in that region are consistently above 70. AlphaFold2 assigns per-residue confidence on a 0-100 scale. Scores above 90 indicate experimentally comparable accuracy. Scores in the 70-90 range are generally usable for structure-based work. Below 70 — treat the geometry as a rough sketch, not a coordinate set.

Practically, this means loading the .pdb from the EBI AlphaFold Database and running a quick per-residue pLDDT inspection before you do anything else. PyMOL's b-factor coloring maps directly to pLDDT. A binding pocket ringed by blue and teal residues (high confidence) is a different situation from one edged in yellow and orange. AF-Multimer adds a second metric: the predicted aligned error (PAE) matrix, which tells you how confidently the model has placed each residue relative to every other residue. For protein-protein interface docking, PAE is arguably more important than pLDDT alone.

ESMFold produces faster predictions with comparable per-residue confidence scores, though in our comparisons on kinase targets, AF2 still outperforms on binding-site geometry when a template in the PDB has even 30% sequence identity. For truly novel folds with no close homolog, the gap narrows significantly.

Binding Site Plausibility: Before You Dock Anything

A predicted structure can have high pLDDT at the active site and still be unsuitable for docking. Why? Three reasons.

First, AF2 was trained on static crystal structures. It predicts a single conformation — the most likely one given the sequence. Real enzymes sample conformational ensembles. If your target is known to undergo induced-fit binding (think kinase DFG-loop flipping, or GPCR orthosteric pocket opening), the AF2 model almost certainly represents a state that doesn't accommodate your chemotype.

Second, cryptic pockets don't show up in AF2 predictions by design. Sites that only open upon ligand binding or in specific allosteric states aren't captured because AF2 saw mostly apo or holo crystal structures, and it averages across them. If your target has a published cryptic pocket identified via MD simulation or cryo-EM, that structural information is not in the AF2 output.

Third, flexible loops adjacent to the binding site are predicted with low pLDDT and may physically occlude the pocket in the predicted geometry even though they'd move in solution. We've seen gating loops block 60-70% of the docking volume in AF2 models of proteases, when the experimental structures show those loops in an open conformation. Running a brief energy minimization before visual inspection is worth the 10-minute overhead.

Relaxation Strategies That Actually Work

Two approaches are worth knowing here, depending on your timeline and resources.

OpenMM minimization. The AF2 coordinate set often has local strain artifacts: bond lengths and angles that would not survive an experimental refinement cycle. Running a constrained energy minimization in OpenMM (or Amber, or any force-field-based minimizer) with heavy atoms restrained and only hydrogens free to move removes most of the steric clashes without altering the overall fold. This takes about 15-20 minutes on a workstation for a 400-residue protein. It is the minimum preparation step before docking.

Ensemble refinement. For targets where conformational sampling matters, running 5-10 short MD trajectories (10-50 ns each) and clustering the resulting conformations gives you an ensemble of binding-site geometries. Docking against an ensemble systematically reduces false-negative rates compared to single-structure docking. In our work on a cysteine protease target with a known flexible active-site loop, ensemble docking recovered 3 additional actives per 100-compound screen compared to single-structure AF2 docking. Not dramatic. Significant.

Hybrid strategies are increasingly standard in serious discovery programs. The general workflow: use AF2 for initial binding-site identification and rough pharmacophore hypothesis, then pull any available experimental structures (PDB hits, cryo-EM depositions) to validate the pocket geometry, then dock against the best available experimental structure when one exists. AF2 fills the gap for novel targets with no structural data.

Common Failure Modes to Know Before Your Next Campaign

Based on campaigns we've run and reviewed, these are the failure patterns that recur.

Failure Mode Signal Mitigation
Low-pLDDT binding site pLDDT < 70 at >30% of pocket residues Use experimental structure or MD ensemble
Induced-fit pocket No published AF2 docking success for that target class IFD protocols; ensemble sampling
Occluding loop Pocket volume < 200 ų in AF2 but > 500 ų in crystal Loop modeling; MD opening simulation
Cryptic pocket target Literature describes pocket not present in AF2 Use MD-identified pocket geometry
Multimer interface docking High per-chain pLDDT but high PAE at interface AF-Multimer + PAE filtering; cryo-EM validation

Practical Notes on AF-Multimer and ESMFold

AF-Multimer (AF2 for complexes) is genuinely useful when your target is an obligate multimer or when the binding site only forms at a protein-protein interface. The PAE output is essential reading here. High inter-chain PAE means low confidence in the relative orientation of the two chains — and that directly undermines any docking or interface-targeted screening built on top of that model.

ESMFold trades some accuracy for speed, producing predictions in seconds rather than minutes. For rapid triage of a large target list — say, screening 200 potential targets for druggable pocket presence — ESMFold is a practical first pass. For the 20 targets that survive triage and enter active docking campaigns, regenerate with AF2. The accuracy difference matters at that stage.

Integrating AF2 into a Seed Biotech Workflow

Here's the thing: most seed biotechs are not resource-constrained on AF2 access. The EBI AlphaFold Database covers 200+ million sequences. The constraint is knowing which structures require extra preparation and which can go straight into a docking pipeline.

A decision tree we use at Moleculepath:

  1. Check pLDDT at the binding site. If > 80 across the pocket, proceed to minimization.
  2. Check PDB for any experimental structure. Even a distant homolog at 40% identity is worth using over pure AF2 for a confirmed binding pocket.
  3. Check for known conformational flexibility. If the target class is known for induced-fit (kinases, GPCRs, proteases with gating loops), plan for ensemble docking from the start.
  4. Run OpenMM minimization. Always. Takes 15 minutes. Removes artifacts. Non-negotiable.
  5. Validate pocket volume. If the dockable volume after minimization is under 200 ų, something is blocking the site. Investigate before running 50K compounds through it.

Honest assessment: AF2 has reduced the barrier to structure-based drug discovery considerably. A target with no experimental structures is no longer a dead end in the way it was in 2019. But the models are not a replacement for experimental validation at key decision points. Our data shows that programs which combine AF2 models with even one experimental structure for binding-site validation have substantially lower rates of false-active enrichment in virtual screening — anecdotally, we estimate 2x improvement in confirmation rates at biochemical assay, though this varies considerably by target class.

Use AlphaFold2. Use it early. Just don't use it blind.

Running a structure-based campaign on a novel target? Talk to the Moleculepath team about integrating AlphaFold2 models into your screening workflow.

Related Articles

Structure-Based Virtual Screening Primer
Drug Discovery

Structure-Based Virtual Screening: A Primer for Biotech Teams

Glide Docking Scoring Explained
Computational Chemistry

Glide Docking Scores: What They Mean and When to Trust Them

ADMET Early Filters
Drug Discovery

Early ADMET Filters: How Front-Loading Reduces Campaign Attrition