rfab-harness

Campaign orchestration for de novo antibody design against cancer and rare disease targets

February 2026

Why this exists

In January 2025, the Baker Lab published RFAntibody—the first open-source pipeline for de novo antibody design using diffusion models. It demonstrated atomically accurate design of single-domain antibodies against five therapeutic targets.

The pipeline works. But running it against a new target requires stitching together three separate tools, converting file formats, manually selecting hotspot residues, tuning CDR loop lengths, and parsing raw Quiver score files. Each campaign is an ad hoc scripts-and-prayer affair.

This harness turns a campaign into a single YAML file. Define your target, pick your antibody format, set your thresholds, and run one command. We provide 21 pre-built configs—6 reproducing the original paper, 10 for cancer immunotherapy targets, and 4 for rare diseases—so you can start designing antibodies on day one.

The Pipeline

RFAntibody designs antibodies in three stages, each a separate ML model:

YAML
Campaign Config

→

Target Prep
Fetch • Truncate • Validate

→

RFdiffusion
Backbone generation

→

ProteinMPNN
Sequence design

→

RF2
Structure prediction

→

Analysis
Filter • Rank • Export

Stage 1 (RFdiffusion) generates antibody backbone structures using SE(3)-equivariant denoising diffusion. You specify which target residues to contact (hotspots) and how long each CDR loop should be. It produces thousands of diverse backbone conformations.

Stage 2 (ProteinMPNN) fills in amino acid sequences for each backbone, designing multiple sequence variants per structure. The CDR-specific masking ensures the framework regions stay fixed while loop sequences are optimized.

Stage 3 (RF2) predicts the structure of each designed antibody-antigen complex and scores it. Three metrics determine whether a design is worth pursuing:

Metric	Threshold	What it measures
pAE	< 10 Å	Predicted aligned error—confidence that the binding interface is real
RMSD	< 2 Å	How well the predicted structure matches the designed backbone
ddG	< −20 REU	Binding free energy—lower means tighter predicted binding

Designs passing all three filters are ranked by a composite score (0.4×pAE + 0.3×RMSD + 0.3×ddG, min-max normalized) and exported as individual PDB files ready for experimental validation.

What the Harness Adds

Pre-built campaigns

Unit tests

1 cmd

Config → candidates

Campaign-as-config — One YAML file defines target, antibody format, CDR lengths, pipeline parameters, and filtering thresholds
Target preparation — Automatic PDB fetching from RCSB, epitope-based truncation (10Å buffer), hotspot validation (hydrophobicity, contiguity), framework conversion to HLT format
15 validation rules — Catches misconfigured campaigns before burning GPU hours (hotspot ⊆ epitope, CDR ranges within biological limits, VHH vs scFv chain requirements)
Multi-GPU parallelization — Automatic splitting via Quiver qvsplit across available GPUs
Checkpoint/resume — Pipeline stages write checkpoint files; interrupted runs resume from the last completed stage
Analysis pipeline — Score extraction from Quiver files, configurable filtering, composite ranking, HTML/CSV reports with score distributions, PDB + FASTA export
Experimental planning — Auto-generated protocols for gene synthesis orders (yeast codon-optimized), yeast surface display screening, SPR kinetics, and OrthoRep affinity maturation

Quick Start

# Install
git clone https://github.com/inventcures/repro_rfantibody_for-cancer-targets.git
cd repro_rfantibody_for-cancer-targets
pip install -e .

# Validate a campaign config (no GPU needed)
rfab validate campaigns/smoke_test.yaml

# Dry run — prepare inputs, check everything works
rfab run campaigns/smoke_test.yaml --dry-run --rfantibody-root ./RFAntibody

# Full campaign
rfab run campaigns/cancer/pdl1_vhh.yaml --rfantibody-root ./RFAntibody

# Re-analyze with different thresholds
rfab analyze campaigns/cancer/pdl1_vhh.yaml

Paper Reproductions

Six configs reproduce the targets from Bennett et al. (2025) with exact parameters from the paper:

Target	Format	PDB	Designs
Influenza HA stem	VHH	4BGW	9,000
C. difficile TcdB	VHH	7UMN	10,000
C. difficile TcdB	scFv	7UMN	10,000
RSV Site III	VHH	4JHW	10,000
PHOX2B-HLA neoantigen	scFv	modeled	10,000
SARS-CoV-2 RBD	VHH	6M0J	10,000

Cancer Targets

Ten campaigns targeting validated cancer antigens, prioritized by structural data quality and therapeutic precedent:

Immune Checkpoints

Target	Indication	PDB	Strategy
PD-L1	Broad solid tumors	5N2C	VHH targeting BC/DE loop interface
CTLA-4	Melanoma, renal	1I8L	VHH blocking B7 ligand binding
TIGIT	Emerging checkpoint	6V33	VHH blocking PVR interaction

Receptor Tyrosine Kinases & Surface Antigens

Target	Indication	PDB	Strategy
HER2	Breast, gastric	1N8Z	VHH domain IV (trastuzumab-like)
EGFR	NSCLC, colorectal	1NQL	VHH domain III blocking EGF
TROP-2	Solid tumors (ADC)	7E5M	VHH for ADC conjugation
GPC3	Hepatocellular carcinoma	7YIO	VHH targeting heparan sulfate site
Claudin-18.2	Gastric, pancreatic	7RFB	VHH extracellular loop (exploratory)

B-cell Antigens

Target	Indication	PDB	Strategy
CD20	B-cell lymphoma	6Y4A	VHH extracellular loop
CD19	B-cell malignancies	6AL5	scFv (bispecific potential)

Rare Disease Targets

4 Campaigns

Target	Indication	PDB	Strategy
Complement C5	PNH / aHUS	3CU7	VHH blocking C5 convertase cleavage
PCSK9	Familial hypercholesterolemia	3BPS	VHH blocking LDLR interaction
IL-6R	Systemic JIA	1N26	VHH blocking IL-6 binding
GNE	GNE myopathy	4WMN	VHH enzyme stabilizer (unconventional)

Technical Details

Campaign Config Schema

Each campaign is a YAML file with six sections:

# campaigns/cancer/pdl1_vhh.yaml
campaign:
  name: "pdl1_vhh"

target:
  pdb_id: "5N2C"
  chain_id: "A"
  epitope_residues: [54, 56, 58, 60, 62, ...]
  hotspot_residues: [56, 60, 115]
  truncation:
    enabled: true
    buffer_angstroms: 10.0

antibody:
  format: "vhh"
  framework: "builtin:NbBCII10"
  cdr_loops:
    H1: "7"
    H2: "6"
    H3: "5-13"    # variable length range

pipeline:
  rfdiffusion:
    num_designs: 10000
  proteinmpnn:
    sequences_per_backbone: 5
    temperature: 0.2

filtering:
  pae_threshold: 10.0
  rmsd_threshold: 2.0
  ddg_threshold: -20.0

Validation Rules

The harness validates 15 rules before any GPU computation:

Exactly one target source (PDB ID xor local file)
Hotspot residues must be a subset of epitope residues
Minimum 3 epitope residues and 3 hotspot residues
VHH format cannot specify light chain CDRs (L1/L2/L3)
scFv format must specify both heavy and light chain CDRs
Builtin framework name must exist in the framework registry
CDR loop lengths must fall within biological limits (max 20 residues)
Range specs must be valid (min ≤ max)
Minimum 50 designs (below this, too few for meaningful filtering)
ProteinMPNN temperature between 0.01 and 1.0

Antibody Formats

Format	Chains	CDR Loops	Framework	Use Case
VHH (nanobody)	H only	H1, H2, H3	NbBCII10	Single-domain, small (~15 kDa), stable
scFv	H + L	H1-H3, L1-L3	hu4D5-8	Full variable region (~27 kDa), bispecific building block

Composite Scoring

Designs passing all filters are ranked by:

score = 0.4 × norm(pAE) + 0.3 × norm(RMSD) + 0.3 × norm(ddG)

where norm() is min-max normalization across all passing designs (lower composite score = better candidate). Weights reflect that binding confidence (pAE) is the most informative single metric per the original paper.

From Computation to Experiment

The harness includes experimental planning modules that generate protocols for the complete design-to-validation cycle:

Gene synthesis — Yeast codon-optimized sequences with restriction site removal, ready for vendor upload (Twist, IDT, GenScript)
Yeast surface display (YSD) — Expression induction, FACS staining (anti-HA-FITC + target-PE), sorting gates, timeline
SPR characterization — Immobilization chemistry, concentration series (1.5–500 nM), kinetic fitting (1:1 Langmuir), regeneration conditions
Affinity maturation — OrthoRep continuous evolution plan: 4 selection rounds with decreasing antigen concentration, expected ~100× affinity improvement over 8 weeks

This mirrors the validation workflow from the RFAntibody paper, where YSD screening followed by SPR confirmation identified binders from 1–2% of computationally passing designs.

Source code & all campaign configs:
github.com/inventcures/repro_rfantibody_for-cancer-targets

Related: Antibodies from Thin Air — comparison of five de novo antibody design platforms