Hit Identification Timelines at Seed-Stage Biotechs: What to Expect

Timeline diagram for hit identification at seed-stage biotech

Eight weeks. That's the number that comes up most often when seed-stage biotech founders estimate their first structure-based hit identification campaign. In our experience running campaigns for CROs and early-stage biotechs, the actual number is closer to 14 weeks — and that's when everything goes reasonably well. Here's a stage-by-stage breakdown of where that time actually goes.

Why Timelines Get Underestimated

The underestimation isn't irrational. Founders and discovery leads tend to scope the compute-intensive parts — docking, ML re-ranking — and forget that those stages are maybe 30% of the total calendar time. The other 70% is prep, review cycles, coordination, and rework. Every time.

There's also a version of the problem that comes from reading vendor brochures. A pipeline that takes "3 to 5 business days" to run is not the same as a campaign that delivers an actionable short-list in 3 to 5 business days. The former is the execution window after the inputs are ready. Getting to ready inputs often takes weeks on its own.

Stage 1: Target Preparation (1-3 Weeks)

Before a single compound is docked, the target structure has to be in docking-ready condition. This means protonation state assignment, water molecule placement, binding-site definition, and receptor grid generation. For a crystal structure deposited in the PDB with a co-crystallized ligand that defines the binding site cleanly, this can be done in a few hours with automated tools.

Most campaigns don't start there.

In our work, roughly 60% of incoming targets from seed-stage clients are AlphaFold2 models, not crystal structures. AF2 models introduce complications. The binding site may not be obvious from the structure alone. Side-chain conformations near the binding pocket are often in low-confidence regions. Water placement requires more manual judgment. If the target has a flexible loop near the binding site, there may be a legitimate question about which conformational state to dock against.

Factor in the back-and-forth between computational and biology teams to confirm binding-site residue choices, and target prep alone can run 2 to 3 weeks for a challenging structure.

Stage 2: Library Curation (3-7 Business Days)

The compound library needs to be in SMILES format, de-duplicated, and filtered for basic drug-like properties before docking. Teams underestimate this consistently. A commercial library purchased from Enamine or Mcule arrives in a specific format that may need conversion. An internal library accumulated over multiple projects may have inconsistent tautomer representations, duplicate entries, or records with broken SMILES strings.

One early partner brought us an "internal library" of 120,000 compounds. After standardization and deduplication, 34,000 were unique and format-valid. That's not unusual. Fact: we've seen de-duplication ratios as severe as 4:1 on accumulated internal libraries.

The cleaner the library input, the faster this stage runs. If the team can deliver a validated, standardized SMILES file from day one, library prep drops to one to two days. Most teams cannot.

Stage 3: Virtual Screening Execution (2-5 Business Days)

This is the part that gets quoted in vendor timelines. GPU-accelerated docking on a prepared library of 500,000 to 5 million compounds, followed by ML re-ranking. With Schrodinger Glide SP and a properly parallelized compute environment, 500K compounds can be docked in roughly 8 to 12 hours. XP docking on the top 5% takes another day. Re-ranking runs overnight.

Total compute execution: 3 to 5 business days is accurate. This is the portion most founders are mentally quoting when they say "8 weeks."

Practical note: Docking results are not a deliverable. They are input to the next stage. A ranked list of 50 docking scores is not the same as a short-list your medicinal chemistry team can act on. That requires the ADMET pass and PAINS filtering stages that follow.

Stage 4: ADMET Scoring and PAINS Filtering (2-4 Business Days)

Every candidate on the docking short-list gets an ADMET profile: predicted CYP inhibition, hERG liability, aqueous solubility, oral bioavailability, and blood-brain barrier penetration. Compounds that score poorly across multiple dimensions get flagged or deprioritized. PAINS filtering runs against 480 substructure alerts from Baell and Holloway plus extended in-house alert libraries.

This stage typically removes 15 to 35% of docking hits. The range is wide because it depends heavily on the source library. Libraries skewed toward older commercial collections with high reactive-group prevalence can see higher PAINS hit rates. Libraries biased toward fragment-based starting points tend to have lower PAINS incidence but worse solubility profiles.

Runtime for this stage on a 50-compound short-list is not the bottleneck. What takes time is the internal review loop: a biology team member or medicinal chemist reviewing flagged compounds to decide whether a PAINS alert is disqualifying or merely a note for synthesis handling. Two to four business days is realistic when that review requires scheduling time across a small team.

Stage 5: Short-List Review and Synthesis Feasibility (1-2 Weeks)

The computational deliverable is 20 to 50 compounds with docking scores, binding poses, ADMET flags, and synthesis-feasibility scores. The campaign isn't complete until a medicinal chemist and a biology lead have agreed on which compounds to actually route to synthesis. That review takes time.

At a 10-person seed-stage biotech, the people doing this review are also running experiments, attending investor calls, and managing vendor relationships. Getting 90 minutes of focused review time from two key people can take a week of calendar coordination on its own.

We've found that teams with a dedicated medicinal chemistry lead move through this stage in 3 to 5 business days. Teams where the biology founder is doubling as the compound selection decision-maker often take 7 to 12 business days. Neither is wrong. It's just the reality of what a 10-person company looks like.

Stage 6: Synthesis Routing (1-3 Weeks, Parallel)

Final compound selection triggers synthesis routing, usually to an external CRO. This stage can begin in parallel with the short-list review if the team pre-qualifies a synthesis partner ahead of the campaign. Most seed-stage teams haven't done that pre-qualification. So synthesis routing adds another 1 to 3 weeks to the calendar before wet-lab confirmation can begin.

This stage is outside the computational campaign scope entirely, but it's part of the actual timeline to a validated hit. Ignoring it produces the 8-week estimate. Including it produces the 14-week reality.

Where Timelines Can Actually Compress

If you're planning your first structure-based hit identification campaign, here are the places where timeline compression is actually achievable:

  • Pre-qualify a synthesis CRO before the campaign starts. This alone can compress the tail end of the timeline by 1 to 2 weeks.
  • Deliver a clean, standardized compound library. Get a computational chemist to QC the library SMILES before submitting. One day of library prep on your side saves a week of back-and-forth.
  • Define binding-site residues before target prep begins. Have the biology lead annotate the PDB structure with binding-site residue numbers before handing it to a computational team. Even a rough annotation cuts target prep time significantly.
  • Block calendar time for short-list review before the results arrive. Schedule the 90-minute review meeting in week 3. Reschedule it if the results are delayed. Don't wait until results land to start finding time on the calendar.

A Realistic Planning Template

For a 2-million-compound library screened against a single target:

  • Weeks 1-3: Target prep (structure validation, binding site definition, receptor grid)
  • Week 2-3 (parallel): Library curation and SMILES standardization
  • Weeks 3-4: Virtual screening execution (pharmacophore filter, SP docking, XP docking, ML re-ranking)
  • Week 5: ADMET scoring and PAINS filtering
  • Weeks 5-6: Short-list review and synthesis feasibility assessment
  • Weeks 6-8: Synthesis routing and external CRO queue

Best-case outcome with clean inputs and a responsive team: 8 to 10 weeks from target submission to synthesis-ready short-list. Typical outcome with the coordination realities of a seed-stage team: 12 to 16 weeks. Neither is a failure. Both are just how this work actually runs.

Understanding the realistic timeline upfront is what lets you plan investor updates, set milestone dates for your next financing round, and avoid the credibility hit of reporting a miss that was predictable from week one. That's the goal of this kind of honest scoping — not to make the timeline look longer, but to make your planning accurate.

Planning a hit identification campaign? Talk to our team about realistic timelines for your target and library.