Home · Benchmark Report

CellsWave Benchmark Report

Public distribution · v2 · 2026-04 · Hybrid drug discovery · Protease-focused

Executive Summary

CellsWave v1.0 is a proprietary accelerated virtual screening platform. This report summarises external validation on the community-standard DUD-E benchmark (Mysinger et al., J. Med. Chem. 2012), real-drug recovery on co-crystal structures, and library-scale speed measurements.

Headline results (DUD-E, 10 targets): 7 of 10 targets screening-ready with hybrid scoring (EF@1% ≥ 4), led by Renin (EF@1% = 66.23×, AUROC 0.899). HIV-1 protease EF@1% = 40.09×. 3 coagulation-cascade serine proteases (factor Xa, factor VIIa, urokinase) fall outside the current envelope — a known shared limitation with several published ligand-based methods on these targets.
Speed: 16.77 ms core search (shape channel); 3.3 s end-to-end (hybrid mode, includes structural similarity scoring). Query latency is independent of library size once the one-time library build is complete.
Validated antiviral case: Nirmatrelvir (Paxlovid active) recovered at rank #3 on SARS-CoV-2 Mpro (PDB 6LU7) out of 2.83M screened molecules.

CellsWave v1.0 is validated and optimized for protease and hydrolase targets (aspartic, serine, cysteine, and viral proteases). Contact us at [email protected] to discuss your specific target.

Methodology

Input: SMILES strings. Output: ranked similarity scores.

Dataset: DUD-E (Database of Useful Decoys: Enhanced), Mysinger et al. 2012 — the community-standard benchmark for ligand-based virtual screening. For each target, actives are mixed with a property-matched decoy set approximately 50× larger. Methods must distinguish actives from topologically distinct but physically similar decoys.

For each target, a small set of known actives is used to form a query; the query molecules are held out from the library during scoring. The library is ranked by similarity score and evaluated against ground-truth active / decoy labels. Metrics reported: AUROC (area under ROC curve), EF@x% (enrichment factor in top x%), and a one-sided rank-test p-value.

Architectural disclosure. The internal architecture of the screening engine is proprietary and not disclosed in this report.

Results — DUD-E 10-target panel

Benchmarked against the DUD-E public dataset. The operational metric is EF@1%, the enrichment factor in the top 1% of a ranked screen (1.0× = random; clients only ever look at the top slice). A target is screening-ready at EF@1% ≥ 4. AUROC is shown for reference.

Validated targets (EF@1% ≥ 4, hybrid scoring)

Target	Disease area	Class	EF@1%	EF@5%	AUROC
RENI	Hypertension	protease (asp)	66.23	—	0.899
BACE1	Alzheimer's	protease (asp)	56.24	—	0.775
HIVPR	HIV / AIDS	protease (asp)	40.09	—	0.858
THRB	Thrombosis	protease (ser)	38.61	—	0.900
TRY1	Broad protease	protease (ser)	35.02	—	0.820
ACES	Alzheimer's / neuro	hydrolase (esterase)	28.57	—	0.687
GLCM^*	Gaucher disease	hydrolase (glycosidase)	27.69	—	0.790

Results measured with hybrid scoring (hybrid blend). Pure shape-only channel results available in the reproducibility pack.

Outside current envelope (EF@1% < 2)

Target	Disease area	Class	EF@1%	AUROC
FA7	Anticoagulation	protease (ser, coagulation)	0.90	0.685
FA10	Anticoagulation	protease (ser, coagulation)	0.75	0.533
UROK	Thrombolysis	protease (ser, coagulation)	0.00	0.542

The validated panel spans aspartic proteases, classical serine proteases, and two hydrolase sub-classes. The three not-supported targets are all coagulation-cascade serine proteases; other ligand-based industry methods also report reduced performance on this sub-class on DUD-E decoys. We disclose both rather than cherry-pick.

^* GLCM benchmark n = 553 molecules (small sample) — EF@1% stable but with wider confidence interval than the larger target sets.

Real-Drug Validation (approved drugs recovered)

Each approved drug was used as its own reference query against its known co-crystal structure, in a 2.83M-molecule library. A correct rank in the top few is direct evidence the platform recovers known therapeutics from real drug-discovery programs.

Drug	Disease	Target PDB	Rank in 2.83M
Lopinavir	HIV/AIDS	1MUI	#1
Aliskiren	Hypertension	2V0Z	#2
Ritonavir	HIV/AIDS	1HXW	#3
Nirmatrelvir	COVID-19	6LU7	#3
Oseltamivir	Influenza	2HU0	#3
Telaprevir	HCV	3SU3	#5

Each drug was used as its own reference query against its known co-crystal structure.

Speed Benchmark

Metric	Value
Core search	16.77 ms (shape channel, GPU)
Hybrid query	3.3 s end-to-end (shape + structural similarity scoring)
API latency	~3.3 s per hybrid query over HTTPS
Library scale	2.83M molecules

Comparison with Industry Baselines

Method	HIV-1 PR AUROC	Speed
CellsWave v1.0 hybrid	0.858	3.3 s per query on 2.83M molecules
Glide SP	~0.75	hours per 1K molecules
AutoDock Vina	~0.65	days per 1K molecules

CellsWave HIV-1 PR AUROC sits at the upper end of the published range for enterprise docking (Glide SP band 0.70–0.82) while delivering library-scale screening several orders of magnitude faster. Industry AUROC ranges are taken from the standard literature for DUD-E targets.

Scope of Validation

CellsWave v1.0 is validated by direct DUD-E benchmark on 10 targets, a 6-drug real-drug recovery panel on co-crystal structures, and a SARS-CoV-2 Mpro (PDB 6LU7) case study. The validated sub-envelope comprises:

Aspartic proteases — HIV-1 PR, renin, BACE1 (3/3 EF@1% ≥ 4)
Classical serine proteases (deep S1 Arg/Lys pocket) — thrombin, trypsin (2/2 EF@1% ≥ 4)
Serine esterase hydrolases — acetylcholinesterase (EF@1% = 28.57)
Glycoside hydrolases — glucocerebrosidase / GLCM (EF@1% = 27.69, small-n sample)

Known non-supported sub-envelope: coagulation-cascade serine proteases (factor Xa, factor VIIa, urokinase) — EF@1% < 2 on DUD-E decoys. This failure mode is shared with several published ligand-based methods on these specific targets and is reported transparently.

Other target classes (kinases, GPCRs, nuclear receptors, ion channels) are not characterised in v1.0. Contact [email protected] to discuss your specific target.

Scope of Use

Program area	Representative targets (validated + likely envelope)
Antiviral programs	HIV-1 PR (validated, EF@1% = 40.09); SARS-CoV-2 Mpro (case study, Nirmatrelvir recovered at rank #3 on PDB 6LU7)
Metabolic / cardiovascular proteases	renin (validated, hypertension, EF@1% = 66.23); BACE-1 (validated, Alzheimer's, EF@1% = 56.24)
General proteolysis	thrombin (validated, EF@1% = 38.61); trypsin (validated, EF@1% = 35.02)
Human hydrolases	acetylcholinesterase (validated, EF@1% = 28.57); glucocerebrosidase (validated, small-n, EF@1% = 27.69)
Not supported in v1.0	factor Xa, factor VIIa, urokinase (anticoagulation); all kinase / GPCR / nuclear receptor programs. For unlisted targets we run a mini-benchmark before any paid engagement.

Reproducibility

DUD-E benchmark results in this report are reproducible at the external-reviewer level by independent teams. Method version tags are fixed; identical inputs yield identical rankings.

Independent validation available on request to serious qualified enquirers. Reviewers receive a reproducibility pack sufficient to regenerate the numbers in this report on their own hardware, together with a technical briefing on architectural properties relevant to evaluation.

Need the full technical report?

Request the reproducibility pack and per-target breakdown. We share it with serious qualified enquirers along with API access credentials.

Request Full Report →

This document is for public distribution.