Onco-TTT

AI-Powered Cancer Hypothesis Generation

An open-source platform integrating biomedical NER, knowledge graphs, and literature search for oncology research

Standing on the Shoulders of Giants

Acknowledgments & Prior Art

Onco-TTT is an integration platform. It combines established methods and public data sources into a single workflow for cancer researchers. We are transparent about what we built vs. what we built upon.

ARK (Adaptive Reasoning over Knowledge) — Internal Module

"ARK" is our internal codename for the knowledge graph construction pipeline. It combines GLiNER2 entity extraction with OpenTargets GraphQL enrichment and NetworkX graph assembly. There is no external paper called "ARK" that this is based on — the name was coined for this project.

Validation Module — Inspired by Medea (Zitnik Lab)

The multi-source validation module was inspired by Medea, an omics AI agent for therapeutic discovery from Marinka Zitnik's lab at Harvard (bioRxiv 2026). Medea's core insight — that verification-aware agents improve performance by producing transparent, multi-evidence analyses — directly shaped our design. Our module runs parallel checks against DepMap (gene dependency), cBioPortal (genomic alterations), GTEx (tissue expression), ClinicalTrials.gov (active trials), and OpenTargets (drug-target tractability).

Test-Time Training (TTT)

The project name references the concept of Test-Time Training by Yu Sun et al. (arXiv:2407.04620), which adapts neural network parameters at inference time. Transparency note: our current implementation is a simplified graph-based activation propagation heuristic (keyword matching + neighbor spread over a NetworkX graph), not actual neural parameter adaptation. The name reflects our aspirational direction, not the current mechanism.

GLiNER

Entity extraction uses GLiNER, a generalist model for Named Entity Recognition using a bidirectional transformer encoder. GitHub · arXiv:2311.08526

Open Targets

Gene-disease associations come from the Open Targets Platform, a public-private partnership for systematic drug target identification. platform.opentargets.org

Semantic Scholar

Literature search powered by the Semantic Scholar Academic Graph API from the Allen Institute for AI. semanticscholar.org

CELLxGENE Census

Single-cell atlas data from the Chan Zuckerberg Initiative's CELLxGENE Census. cellxgene.cziscience.com

Medea — Omics AI Agent (Zitnik Lab, Harvard)

The verification-aware, multi-evidence analysis approach in Medea directly inspired our validation dashboard design. Medea uses 20 tools spanning single-cell and bulk transcriptomic datasets, cancer vulnerability maps, and pathway knowledge bases. By Sui, Li, Gao, Shen, Giunchiglia, Shen, Huang, Kong & Zitnik. bioRxiv 2026

Why LLMs Aren't Scientists Yet (LossFunk)

Honest failure mode analysis of autonomous AI research attempts. The six documented failure modes — bias toward training data defaults, implementation drift, memory degradation, overexcitement, insufficient domain intelligence, and weak scientific taste — informed our design decisions about what to automate vs. leave to the researcher. By Dhruv Trehan & Paras Chopra. arXiv:2601.03315

METIS — AI Research Mentor (LossFunk)

A tool-augmented, stage-aware AI assistant for guiding students from idea to paper. METIS's approach of combining literature search with methodology checks and curated guidelines influenced our pipeline design. By Kumar, Trehan & Chopra. arXiv:2601.13075

This project does not claim novelty in any of the underlying methods. Our contribution is the integration of these tools into a single, accessible platform for cancer researchers.

Pipeline Architecture

A natural-language query flows through entity extraction, knowledge graph construction, enrichment, and adaptive hypothesis generation. External APIs provide evidence grounding at each stage.

User Query "role of TP53 in cancer" GLiNER2 NER Entity Extraction KG Builder Knowledge Graph OpenTargets Gene-Disease Links Activation Prop. Graph Traversal Hypothesis Generation Results Dashboard Hypotheses + Evidence Semantic Scholar Literature Search CELLxGENE Census Atlas Projection Compute Steps External APIs Output User Input

Biological Entity Types Extracted by GLiNER2

The system uses GLiNER, a generalist NER model built on a bidirectional transformer encoder, fine-tuned for biomedical text. It extracts 10 entity types from free-text oncology queries and abstracts.

Entity Types Recognized Bar lengths are illustrative only and do not represent empirical extraction frequencies. Gene Disease Drug Pathway Mutation Cell Type Biomarker Mechanism Anatomical Site Clinical Outcome

Screenshots

Landing page with suggested queries
Landing page with suggested queries
Knowledge graph visualization
Knowledge graph for "role of TP53 in cancer"
Literature search results
Literature search results from Semantic Scholar
Entity evidence table
Entity evidence table with confidence scores

Validation & Deep Research Modules

Beyond hypothesis generation, the platform provides several modules for deeper investigation of generated hypotheses.

Validation Dashboard

Inspired by Medea's verification-aware approach. Cross-references hypotheses against DepMap (gene dependency), cBioPortal (genomic alterations), GTEx (tissue expression), ClinicalTrials.gov (active trials), and OpenTargets (drug-target tractability). Uses curated fallback data when live APIs are unavailable.

🧬 Protein Structure Analysis

Retrieves predicted structures from AlphaFold and runs pocket detection to identify druggable binding sites on proteins implicated in generated hypotheses.

📜 Patent Landscape

Queries the USPTO PatentsView API to surface existing intellectual property around gene targets and drug compounds mentioned in hypotheses.

🧪 Cell Line Recommendations

Recommends cell lines for experimental validation using Cellosaurus metadata and DepMap dependency scores matched to hypothesis gene targets.

🔬 Experiment Protocol Generator

Generates starter experimental protocols including CRISPR guide RNA (gRNA) design suggestions for functional validation of candidate gene targets.

Technical Stack

FastAPI
Backend (Python)
Next.js 14
Frontend (TypeScript)
GLiNER2
Named Entity Recognition
NetworkX
KG + OpenTargets GraphQL
S2 API
Semantic Scholar
Census
CELLxGENE Atlas
Railway
Deployment

Limitations & Future Work

Known Limitations (we believe in transparency)

  • Heuristic hypothesis generation — hypotheses are assembled from graph traversal heuristics, not from a fine-tuned LLM. Quality varies with query specificity.
  • No automated benchmarking — there is currently no systematic evaluation against known biological ground truths or curated hypothesis corpora.
  • Synthetic UMAP coordinates — single-cell atlas projections use synthetically computed UMAP embeddings, not projections from a reference atlas model.
  • No user accounts or saved sessions — all queries are stateless; users cannot save, compare, or revisit prior analyses.
  • Oncology-only scope — the entity types, knowledge graph schema, and data sources are tailored to cancer biology. Generalization to other disease areas is not supported.

Future Directions

Planned improvements include LLM-powered hypothesis refinement with citation grounding, automated evaluation against curated benchmarks (e.g., COSMIC, CIViC), persistent user sessions, and expansion to additional disease domains.

Preprint & Citation

A scientific preprint describing Onco-TTT's architecture, methods, and limitations is available:

Preprint

Onco-TTT: An Open-Source Platform for Automated Cancer Hypothesis Generation via Entity Extraction, Knowledge Graphs, and Multi-Source Validation
Ashish Makani. February 2026. Not peer-reviewed.
Download PDF · LaTeX source

If you use Onco-TTT in your research, please cite:

```bibtex @software{makani2026oncottt, title={Onco-TTT: An Open-Source Platform for Automated Cancer Hypothesis Generation via Entity Extraction, Knowledge Graphs, and Multi-Source Validation}, author={Makani, Ashish}, url={https://github.com/inventcures/oncology_hypothesis_generation}, year={2026} } ```

Contact: Ashish Makani — ashish.makani@ashoka.edu.in
Last updated: February 2026