rfab-harness
Campaign orchestration for de novo antibody design against cancer and rare disease targets
February 2026
Why this exists
In January 2025, the Baker Lab published RFAntibody—the first open-source pipeline for de novo antibody design using diffusion models. It demonstrated atomically accurate design of single-domain antibodies against five therapeutic targets.
The pipeline works. But running it against a new target requires stitching together three separate tools, converting file formats, manually selecting hotspot residues, tuning CDR loop lengths, and parsing raw Quiver score files. Each campaign is an ad hoc scripts-and-prayer affair.
This harness turns a campaign into a single YAML file. Define your target, pick your antibody format, set your thresholds, and run one command. We provide 21 pre-built configs—6 reproducing the original paper, 10 for cancer immunotherapy targets, and 4 for rare diseases—so you can start designing antibodies on day one.
The Pipeline
RFAntibody designs antibodies in three stages, each a separate ML model:
Campaign Config
Fetch • Truncate • Validate
Backbone generation
Sequence design
Structure prediction
Filter • Rank • Export
Stage 1 (RFdiffusion) generates antibody backbone structures using SE(3)-equivariant denoising diffusion. You specify which target residues to contact (hotspots) and how long each CDR loop should be. It produces thousands of diverse backbone conformations.
Stage 2 (ProteinMPNN) fills in amino acid sequences for each backbone, designing multiple sequence variants per structure. The CDR-specific masking ensures the framework regions stay fixed while loop sequences are optimized.
Stage 3 (RF2) predicts the structure of each designed antibody-antigen complex and scores it. Three metrics determine whether a design is worth pursuing:
| Metric | Threshold | What it measures |
|---|---|---|
| pAE | < 10 Å | Predicted aligned error—confidence that the binding interface is real |
| RMSD | < 2 Å | How well the predicted structure matches the designed backbone |
| ddG | < −20 REU | Binding free energy—lower means tighter predicted binding |
Designs passing all three filters are ranked by a composite score (0.4×pAE + 0.3×RMSD + 0.3×ddG, min-max normalized) and exported as individual PDB files ready for experimental validation.
What the Harness Adds
- Campaign-as-config — One YAML file defines target, antibody format, CDR lengths, pipeline parameters, and filtering thresholds
- Target preparation — Automatic PDB fetching from RCSB, epitope-based truncation (10Å buffer), hotspot validation (hydrophobicity, contiguity), framework conversion to HLT format
- 15 validation rules — Catches misconfigured campaigns before burning GPU hours (hotspot ⊆ epitope, CDR ranges within biological limits, VHH vs scFv chain requirements)
- Multi-GPU parallelization — Automatic splitting via Quiver
qvsplitacross available GPUs - Checkpoint/resume — Pipeline stages write checkpoint files; interrupted runs resume from the last completed stage
- Analysis pipeline — Score extraction from Quiver files, configurable filtering, composite ranking, HTML/CSV reports with score distributions, PDB + FASTA export
- Experimental planning — Auto-generated protocols for gene synthesis orders (yeast codon-optimized), yeast surface display screening, SPR kinetics, and OrthoRep affinity maturation
Quick Start
# Install
git clone https://github.com/inventcures/repro_rfantibody_for-cancer-targets.git
cd repro_rfantibody_for-cancer-targets
pip install -e .
# Validate a campaign config (no GPU needed)
rfab validate campaigns/smoke_test.yaml
# Dry run — prepare inputs, check everything works
rfab run campaigns/smoke_test.yaml --dry-run --rfantibody-root ./RFAntibody
# Full campaign
rfab run campaigns/cancer/pdl1_vhh.yaml --rfantibody-root ./RFAntibody
# Re-analyze with different thresholds
rfab analyze campaigns/cancer/pdl1_vhh.yaml
Paper Reproductions
Six configs reproduce the targets from Bennett et al. (2025) with exact parameters from the paper:
| Target | Format | PDB | Designs |
|---|---|---|---|
| Influenza HA stem | VHH | 4BGW | 9,000 |
| C. difficile TcdB | VHH | 7UMN | 10,000 |
| C. difficile TcdB | scFv | 7UMN | 10,000 |
| RSV Site III | VHH | 4JHW | 10,000 |
| PHOX2B-HLA neoantigen | scFv | modeled | 10,000 |
| SARS-CoV-2 RBD | VHH | 6M0J | 10,000 |
Cancer Targets
Ten campaigns targeting validated cancer antigens, prioritized by structural data quality and therapeutic precedent:
Immune Checkpoints
| Target | Indication | PDB | Strategy |
|---|---|---|---|
| PD-L1 | Broad solid tumors | 5N2C | VHH targeting BC/DE loop interface |
| CTLA-4 | Melanoma, renal | 1I8L | VHH blocking B7 ligand binding |
| TIGIT | Emerging checkpoint | 6V33 | VHH blocking PVR interaction |
Receptor Tyrosine Kinases & Surface Antigens
| Target | Indication | PDB | Strategy |
|---|---|---|---|
| HER2 | Breast, gastric | 1N8Z | VHH domain IV (trastuzumab-like) |
| EGFR | NSCLC, colorectal | 1NQL | VHH domain III blocking EGF |
| TROP-2 | Solid tumors (ADC) | 7E5M | VHH for ADC conjugation |
| GPC3 | Hepatocellular carcinoma | 7YIO | VHH targeting heparan sulfate site |
| Claudin-18.2 | Gastric, pancreatic | 7RFB | VHH extracellular loop (exploratory) |
B-cell Antigens
| Target | Indication | PDB | Strategy |
|---|---|---|---|
| CD20 | B-cell lymphoma | 6Y4A | VHH extracellular loop |
| CD19 | B-cell malignancies | 6AL5 | scFv (bispecific potential) |
Rare Disease Targets
4 Campaigns
| Target | Indication | PDB | Strategy |
|---|---|---|---|
| Complement C5 | PNH / aHUS | 3CU7 | VHH blocking C5 convertase cleavage |
| PCSK9 | Familial hypercholesterolemia | 3BPS | VHH blocking LDLR interaction |
| IL-6R | Systemic JIA | 1N26 | VHH blocking IL-6 binding |
| GNE | GNE myopathy | 4WMN | VHH enzyme stabilizer (unconventional) |
Technical Details
Campaign Config Schema
Each campaign is a YAML file with six sections:
# campaigns/cancer/pdl1_vhh.yaml
campaign:
name: "pdl1_vhh"
target:
pdb_id: "5N2C"
chain_id: "A"
epitope_residues: [54, 56, 58, 60, 62, ...]
hotspot_residues: [56, 60, 115]
truncation:
enabled: true
buffer_angstroms: 10.0
antibody:
format: "vhh"
framework: "builtin:NbBCII10"
cdr_loops:
H1: "7"
H2: "6"
H3: "5-13" # variable length range
pipeline:
rfdiffusion:
num_designs: 10000
proteinmpnn:
sequences_per_backbone: 5
temperature: 0.2
filtering:
pae_threshold: 10.0
rmsd_threshold: 2.0
ddg_threshold: -20.0
Validation Rules
The harness validates 15 rules before any GPU computation:
- Exactly one target source (PDB ID xor local file)
- Hotspot residues must be a subset of epitope residues
- Minimum 3 epitope residues and 3 hotspot residues
- VHH format cannot specify light chain CDRs (L1/L2/L3)
- scFv format must specify both heavy and light chain CDRs
- Builtin framework name must exist in the framework registry
- CDR loop lengths must fall within biological limits (max 20 residues)
- Range specs must be valid (min ≤ max)
- Minimum 50 designs (below this, too few for meaningful filtering)
- ProteinMPNN temperature between 0.01 and 1.0
Antibody Formats
| Format | Chains | CDR Loops | Framework | Use Case |
|---|---|---|---|---|
| VHH (nanobody) | H only | H1, H2, H3 | NbBCII10 | Single-domain, small (~15 kDa), stable |
| scFv | H + L | H1-H3, L1-L3 | hu4D5-8 | Full variable region (~27 kDa), bispecific building block |
Composite Scoring
Designs passing all filters are ranked by:
score = 0.4 × norm(pAE) + 0.3 × norm(RMSD) + 0.3 × norm(ddG)
where norm() is min-max normalization across all passing designs (lower composite score = better candidate). Weights reflect that binding confidence (pAE) is the most informative single metric per the original paper.
From Computation to Experiment
The harness includes experimental planning modules that generate protocols for the complete design-to-validation cycle:
- Gene synthesis — Yeast codon-optimized sequences with restriction site removal, ready for vendor upload (Twist, IDT, GenScript)
- Yeast surface display (YSD) — Expression induction, FACS staining (anti-HA-FITC + target-PE), sorting gates, timeline
- SPR characterization — Immobilization chemistry, concentration series (1.5–500 nM), kinetic fitting (1:1 Langmuir), regeneration conditions
- Affinity maturation — OrthoRep continuous evolution plan: 4 selection rounds with decreasing antigen concentration, expected ~100× affinity improvement over 8 weeks
This mirrors the validation workflow from the RFAntibody paper, where YSD screening followed by SPR confirmation identified binders from 1–2% of computationally passing designs.
Source code & all campaign configs:
github.com/inventcures/repro_rfantibody_for-cancer-targets
Related: Antibodies from Thin Air — comparison of five de novo antibody design platforms