fragment-based SBDD hit identification library design

Why Fragment Libraries Outperform HTS Sets for Structure-Guided Campaigns

2025-03-28 Tomás Vidal — Computational Chemistry Lead 8 min read

High-throughput screening libraries optimize for diversity metrics that have little relationship to the geometry of a specific binding pocket. Fragment-based approaches start from the other direction — and the hit quality difference is measurable.

The Diversity Metric Problem

A well-designed HTS deck contains 50,000 to 500,000 compounds selected to maximize chemical diversity across the collection as a whole. Diversity here typically means pairwise Tanimoto dissimilarity across a fingerprint space — Morgan radii, ECFP4, or similar. The result is a library that explores chemical space broadly. What it does not do is explore the chemical space compatible with a specific binding pocket.

The distinction matters. For any given target, the binding site imposes geometric and electrostatic constraints on what can bind productively. A cavity that is 8 Å deep with a hydrogen-bond donor at position 4 and a hydrophobic shelf at the back selects for a specific scaffold geometry. Thousands of HTS compounds that are maximally diverse in fingerprint space may all fail the same pocket-geometric filter simultaneously — and no amount of Tanimoto-based diversity in the collection recovers from that.

Fragment libraries start from the pocket, not the collection. A fragment set optimized for a kinase hinge-binder is biased toward planar heterocyclic scaffolds with H-bond donor/acceptor pairs at the right vector distance. It is not trying to cover all of chemical space; it is trying to cover the sub-space compatible with the geometric constraints of that pocket. That specificity is the core argument for fragment-based lead discovery.

Physicochemical Constraints Define the Fragment Window

The Ro3 (Rule of Three) cutoffs for fragments — molecular weight ≤ 300, clogP ≤ 3, H-bond donors ≤ 3, H-bond acceptors ≤ 3 — are not arbitrary. They are derived from the observation that fragments bind with millimolar affinity because they make a small number of high-quality contacts, not because they engage multiple pharmacophoric features simultaneously. The low MW ensures the binding contribution is normalized to the fragment footprint, not inflated by hydrophobic burial of a large molecular surface. This is why fragment LE (ligand efficiency) is a more informative metric than raw IC50 at the fragment stage: ΔG/heavy atom count tracks actual binding quality per unit of molecular complexity.

HTS compounds in the 300-500 Da range routinely achieve apparent IC50 values in the micromolar range through mechanisms that fragment screening explicitly avoids: aggregate formation, colloidal inhibition, cysteine reactivity, and intercalation into assay interfaces. Counter-screening against pan-assay interference compound (PAINS) filters removes some of these, but the structural redundancy problem remains. A hundred structurally similar false positives consumes SAR capacity that could have been spent on three genuine fragment hits with orthogonal binding modes.

Library Construction: What Structure-Guided Fragment Selection Actually Involves

Constructing a structure-guided fragment library begins with binding site characterization, not library curation. The SiteMap or FPocket-derived pocket descriptor — volume, depth, polarity distribution, and presence of defined sub-pockets — determines what fragment scaffolds are geometrically compatible. A narrow, deep catalytic pocket (cysteine protease, for instance) tolerates different fragment shapes than a shallow, extended PPI hot-spot surface.

From there, computational pre-filtering removes fragments that are likely to cause assay interference regardless of binding mode: reactive functional groups beyond the intended scope, high PSA compounds that cannot desolvate efficiently, and scaffolds with known PAINS flags. The Enamine REAL Space fragment collection and the Maybridge Ro3 set contain approximately 10,000 to 30,000 filtered fragments from which a target-optimized subset of 1,000 to 3,000 can be selected based on pocket compatibility scores. Shape complementarity can be assessed using ROCS-style overlay against pocket-derived pharmacophore models, or by direct docking with scoring calibrated for fragment-sized ligands — Glide SP and GOLD's ChemScore both require score normalization for fragments given that raw docking scores scale with molecular size.

We're not saying HTS sets are without value for structure-guided work — a well-designed HTS campaign against a protein with a well-characterized binding site can produce high-quality hits. The argument is more specific: when the structural data is available and the binding site geometry is defined, selecting compounds without reference to that geometry is a missed optimization step that consistently reduces hit quality per compound synthesized downstream.

A Concrete Scenario: Fragment Screening Against a Novel Epigenetic Reader Domain

Consider a Cambridge-area oncology startup that in mid-2024 obtained a 2.1 Å crystal structure of a novel bromodomain-adjacent reader module — a shallow, relatively hydrophobic surface pocket with a single conserved asparagine anchor point. The team ran an initial 80,000-compound HTS deck in a thermal shift assay and obtained 620 apparent binders above a 2°C threshold. After AlphaFold homolog comparison and PAINS filtering, 410 remained. Counter-screening against a closely related bromodomain to assess selectivity reduced the list to 68. Of these, SPR confirmed binding for 22. When the 22 were docked into the published structure, only 9 showed poses with productive contacts at the asparagine anchor — the other 13 appeared to bind at a non-primary interface only detectable at high compound concentration.

A parallel fragment campaign run against the same structure selected 1,800 fragments from the Enamine set based on pocket volume and asparagine H-bond geometry. Of these, 120 showed thermal shift responses above 1.5°C. After SPR, 34 confirmed binders remained, 28 of which docked into the asparagine sub-pocket with identifiable H-bond contacts. Fragment-to-lead elaboration from the top 8 scaffolds — using docking-guided vector analysis — produced 3 distinct chemical series within 6 weeks of synthesis. The HTS route had produced 9 confirmed structural binders; the fragment route produced 28, with more scaffoldally diverse and chemically tractable starting points for hit-to-lead.

This scenario is representative of what the published FBDD literature describes for reader domain targets: the short binding site tenure of small fragments makes PAINS artifacts less likely, and the constraint of fragment MW means the binding events that survive SPR are more likely to represent genuine target engagement. The Astex fragment-based BACE-1 program, the Abbott ABT-199 fragment origin story, and the SGC's public bromodomain fragment data all tell a similar structural story: the fragments that advanced had demonstrable binding mode clarity from the first co-crystal structure, where HTS-derived actives often required multiple cycles of crystallization attempts.

Structural Feedback as the Differentiating Factor

The practical advantage of fragment libraries over HTS sets in structure-guided campaigns comes down to one thing: the tractability of co-crystallization. Fragment hits typically bind in a single, well-resolved pose because there are few rotatable bonds and the interaction surface is limited. Co-crystal structures at fragment stage are obtainable in days with an optimized soaking protocol, particularly for well-diffracting targets. HTS-derived hits at the 300-500 Da range often produce multiple binding modes, partial occupancy, or require extensive protein engineering to crystallize with the ligand bound.

Without a co-crystal structure, hit-to-lead chemistry is navigating blind. The docking pose suggests which vectors are available for elaboration, but docking accuracy for drug-like HTS compounds — which involve more rotatable bonds and larger sampling spaces — is meaningfully lower than for rigid fragments. Glide XP on fragments with ≤5 rotatable bonds shows RMSD to co-crystal pose below 1.5 Å in approximately 70-80% of cases on CASF-2016 benchmark sets; for drug-like compounds with 8+ rotatable bonds, that fraction drops to 50-60% depending on the force field and target class.

The fragment library approach is not a universal improvement over HTS. For targets where no structural data exists — where the only available starting structure is a homology model with uncertain pocket geometry — fragment screening provides less of an advantage because the structural feedback loop that drives fragment elaboration is compromised. Structure-guided fragment selection is most powerful precisely when the structural data is highest quality.

Library Maintenance and Chemical Tractability

Fragment collections require ongoing curation in a way HTS sets do not. Fragment stability under assay conditions (pH 7.4, DMSO tolerance, freeze-thaw cycles) is more sensitive than for drug-like compounds because the MW is lower and the structural integrity of the fragment's functional group is proportionally more critical. Libraries that are not regularly re-QC'd by LC-MS accumulate impurities and degradation products at rates that compromise thermal shift and SPR assay signal. The rule of thumb in the FBDD field is that 10-15% of a fragment collection will fail re-QC after 18 months of storage, and that QC-failed compounds are disproportionately represented in apparent hit lists because degradation products are often reactive.

Computational curation — removing compounds that are flagged in updated PAINS filter lists, removing scaffolds that have appeared as frequent hitters in public data (ChEMBL promiscuity analysis) — is a continuous process, not a one-time filter. A fragment library that was clean in 2020 is not necessarily clean in 2024 given the expansion of public bioactivity data that informs those filters.

The structure-guided approach to fragment library design also enables target-class specialization that amortizes the curation cost. A fragment set curated for kinase hinge-binding — ATP-competitive fragments with defined heterocycle scaffolds and hinge H-bond geometry — can be reused across kinase programs with minor pocket-specific supplementation. Over time, the curated fragment library becomes a platform asset, not a one-time cost per program. That is how the Astex, Evotec, and SGC fragment platforms derive their efficiency advantages at scale.

For details on how our structural screening integrates fragment library selection with pocket characterization, see the Moleculepath Screening Platform overview, or the methodology notes on our Science page.

Related reading: Designing an SBDD Hit-to-Lead Campaign covers how fragment starting points translate into a hit-to-lead program once structural confirmation is in hand.