Neuro-Symbolic AI
Open Problems
Differentiable symbolic reasoning. Categorical deep learning. Verified clinical AI.
4 projects spanning Intermediate to Advanced.
2026 is shaping up as the year neuro-symbolic AI moves from academic curiosity to serious infrastructure. The central challenge — how to backpropagate through a discrete symbolic system — has resisted clean solutions for decades. Three approaches are now converging: Scallop (differentiable Datalog that compiles logic programs into provenance semirings, enabling gradients to flow through symbolic rules), Logical Neural Networks (IBM's framework where first-order logic rules carry learnable weights, shown at AMIA 2025 to outperform traditional ML for clinical diagnosis), and Categorical Deep Learning (Gavranovic et al., which attracted $30M for Symbolica by treating category theory as a unifying mathematical language for both neural and symbolic computation).
Why does this matter for science? Two application domains make the case sharply. In clinical reasoning, Adam Rodman's research at Harvard/BIDMC shows that LLMs already score 10/10 on structured clinical reasoning benchmarks, yet clinicians augmented with AI sometimes underperform AI alone — because automation bias causes them to defer without engaging their own judgment. A neuro-symbolic architecture with formal safety guarantees (certain rules literally cannot be overridden by the neural component) addresses this directly. In drug discovery, knowledge graphs like the one underlying Onco-TTT generate hypotheses via heuristic graph diffusion; replacing that heuristic with differentiable symbolic reasoning (Scallop over pharmacological rules, categorical composition over drug-interaction functors) makes the inference chain verifiable and the learning end-to-end.
The four projects here derive directly from Ishaan Varior's five research directions developed jointly with Ashish Makani. Direction D3 (dual-process clinical reasoning) maps to I5 and A10 — a prototype and a formally verified full system respectively. Direction D4 (differentiable symbolic KGs) maps to I6 and A5 — a Scallop-based implementation and a full categorical framework respectively. D5 (AI math discovery) connects to A10 through formal verification methods and to the Formal Math domain page. Together they form a coherent research program: neural networks for flexible pattern recognition, symbolic systems for provable guarantees, and a mathematically principled interface between them.
The deepest open problem remains unsolved: there is no universally satisfying way to backpropagate through a symbolic system. REINFORCE-style gradient estimators are high variance. Fuzzy/soft logic relaxations lose exactness. Straight-through estimators are theoretically unprincipled. Scallop's provenance semiring approach is the most promising current answer, and all four projects give students hands-on experience with exactly this frontier.
Connection to Ishaan Varior's Research Directions
Ashoka Coursework Connections
| Course | Concept Used | Projects |
|---|---|---|
| CS 5310 / MAT 3216 Symbolic Logic | First-order logic, proof theory, model theory | I5, A10 |
| CS 3410 / MAT 3211 Intro to ML | Neural networks, gradient-based optimization, confidence calibration | I5, I6 |
| CS 2201 Data Structures & Algorithms | Graph construction, traversal, knowledge graph manipulation | I6 |
| MAT 3102 Algebra II / Category Theory | Functors, natural transformations, compositionality proofs | A5 |
| CS 4101 Theory of Computation | Formal verification, decidability, proof systems | A10 |
The 4 Projects
| ID | Project | Tier | Prereqs | Description |
|---|---|---|---|---|
| I5 | Neuro-Symbolic Clinical Reasoning Prototype | Intermediate | Intro ML, Discrete Math / Logic | Build a dual-process clinical reasoning module: LLM as System 1 (fast pattern-based diagnosis), rule-based NCCN guideline checker as System 2, with formally specified triggering conditions. Evaluated on 50 oncology vignettes across 3 cancer types, comparing neural-only, symbolic-only, and hybrid performance. |
| I6 | Drug Discovery KG with Differentiable Reasoning | Intermediate | Intro ML, DSA, Algebra I | Construct a CYP450 drug-interaction knowledge graph from DrugBank (50 oncology drugs), encode pharmacological rules in Scallop, and train a GNN link predictor end-to-end through the differentiable symbolic layer. Demonstrates the core neuro-symbolic insight: symbolic constraints improve neural predictions while keeping hard rules non-negotiable. |
| A5 | Categorical Deep Learning for Drug Discovery KGs | Advanced | Algebra II / Category Theory, Intro ML | Model the drug discovery knowledge graph as a category (biomedical entities as objects, relationships as morphisms), drug-drug interactions as functors, and pharmacological constraints as natural transformations. Prove compositionality: under what categorical conditions is transitivity of drug safety guaranteed? Integrate the Scallop differentiable layer and replace Onco-TTT's heuristic graph diffusion. Lean4 formalization is a stretch goal. Thesis-level; targets NeurIPS, ICML, or ACT 2026. |
| A10 | Formally Verified Dual-Process Clinical Reasoning | Advanced | Theory of Computation, Symbolic Logic, Intro ML | Extend I5 to a production-grade system: Logical Neural Networks (learnable rule weights) layered over Lean4-verified hard safety rules (e.g., "never give immunotherapy to a patient with active autoimmune disease"). Formally prove that hard rules cannot be overridden by any neural input. Integrate into Onco-Shikshak and evaluate with oncology residents on 100 vignettes. Targets JAMIA, AAAI, or Nature Medicine. |
Key References
- Li, Z., et al. (2023). Scallop: A Language for Neurosymbolic Programming. PLDI. github.com/scallop-lang/scallop
- Gavranovic, B., et al. (2024). Categorical Deep Learning: An Algebraic Theory of Architectures. arXiv:2402.15332. (Foundation for Symbolica, $30M funding.)
- Logical Neural Networks (2025). First-order logic rules with learnable weights for clinical diagnosis. AMIA Annual Symposium. PMC12150699.
- Rodman, A. (2025). Towards a Medical Superintelligence. Massachusetts Medical Society. (LLMs score 10/10 vs 9 for attendings on r-IDEA; automation bias paradox.)
- Fong, B. & Spivak, D. I. (2019). An Invitation to Applied Category Theory: Seven Sketches in Compositionality. Cambridge University Press. arXiv:1803.05316 (free PDF)
- Wishart, D. S., et al. (2018). DrugBank 5.0. Nucleic Acids Research, 46(D1), D1074–D1082. Primary data source for I6 and A5.
- Croskerry, P. (2009). A Universal Model of Diagnostic Reasoning. Academic Medicine, 84(8), 1022–1028. Foundational dual-process theory paper.
- Kahneman, D. (2011). Thinking, Fast and Slow. Farrar, Straus and Giroux. Canonical System 1 / System 2 framework underlying I5 and A10.
Interested in a neuro-symbolic project?
Email me at my Ashoka address with subject "IML: [I5 / I6 / A5 / A10]".
I5 and I6 are good entry points; A5 and A10 suit thesis students.
Last updated March 2026. Full project specs available on request.