← All 40 Open Problems  /  Neuro-Symbolic AI
Domain · 4 Projects · Spring 2026

Neuro-Symbolic AI
Open Problems

Differentiable symbolic reasoning. Categorical deep learning. Verified clinical AI.
4 projects spanning Intermediate to Advanced.

2026 is shaping up as the year neuro-symbolic AI moves from academic curiosity to serious infrastructure. The central challenge — how to backpropagate through a discrete symbolic system — has resisted clean solutions for decades. Three approaches are now converging: Scallop (differentiable Datalog that compiles logic programs into provenance semirings, enabling gradients to flow through symbolic rules), Logical Neural Networks (IBM's framework where first-order logic rules carry learnable weights, shown at AMIA 2025 to outperform traditional ML for clinical diagnosis), and Categorical Deep Learning (Gavranovic et al., which attracted $30M for Symbolica by treating category theory as a unifying mathematical language for both neural and symbolic computation).

Why does this matter for science? Two application domains make the case sharply. In clinical reasoning, Adam Rodman's research at Harvard/BIDMC shows that LLMs already score 10/10 on structured clinical reasoning benchmarks, yet clinicians augmented with AI sometimes underperform AI alone — because automation bias causes them to defer without engaging their own judgment. A neuro-symbolic architecture with formal safety guarantees (certain rules literally cannot be overridden by the neural component) addresses this directly. In drug discovery, knowledge graphs like the one underlying Onco-TTT generate hypotheses via heuristic graph diffusion; replacing that heuristic with differentiable symbolic reasoning (Scallop over pharmacological rules, categorical composition over drug-interaction functors) makes the inference chain verifiable and the learning end-to-end.

The four projects here derive directly from Ishaan Varior's five research directions developed jointly with Ashish Makani. Direction D3 (dual-process clinical reasoning) maps to I5 and A10 — a prototype and a formally verified full system respectively. Direction D4 (differentiable symbolic KGs) maps to I6 and A5 — a Scallop-based implementation and a full categorical framework respectively. D5 (AI math discovery) connects to A10 through formal verification methods and to the Formal Math domain page. Together they form a coherent research program: neural networks for flexible pattern recognition, symbolic systems for provable guarantees, and a mathematically principled interface between them.

The deepest open problem remains unsolved: there is no universally satisfying way to backpropagate through a symbolic system. REINFORCE-style gradient estimators are high variance. Fuzzy/soft logic relaxations lose exactness. Straight-through estimators are theoretically unprincipled. Scallop's provenance semiring approach is the most promising current answer, and all four projects give students hands-on experience with exactly this frontier.

Connection to Ishaan Varior's Research Directions

D3
Dual-Process Clinical Reasoning
Projects: I5, A10
D4
Differentiable Symbolic KGs
Projects: I6, A5
D5
AI Math Discovery
Methods feed into A10 (Lean4 verification)

Ashoka Coursework Connections

CourseConcept UsedProjects
CS 5310 / MAT 3216 Symbolic LogicFirst-order logic, proof theory, model theoryI5, A10
CS 3410 / MAT 3211 Intro to MLNeural networks, gradient-based optimization, confidence calibrationI5, I6
CS 2201 Data Structures & AlgorithmsGraph construction, traversal, knowledge graph manipulationI6
MAT 3102 Algebra II / Category TheoryFunctors, natural transformations, compositionality proofsA5
CS 4101 Theory of ComputationFormal verification, decidability, proof systemsA10

The 4 Projects

IDProjectTierPrereqsDescription
I5Neuro-Symbolic Clinical Reasoning PrototypeIntermediateIntro ML, Discrete Math / LogicBuild a dual-process clinical reasoning module: LLM as System 1 (fast pattern-based diagnosis), rule-based NCCN guideline checker as System 2, with formally specified triggering conditions. Evaluated on 50 oncology vignettes across 3 cancer types, comparing neural-only, symbolic-only, and hybrid performance.
I6Drug Discovery KG with Differentiable ReasoningIntermediateIntro ML, DSA, Algebra IConstruct a CYP450 drug-interaction knowledge graph from DrugBank (50 oncology drugs), encode pharmacological rules in Scallop, and train a GNN link predictor end-to-end through the differentiable symbolic layer. Demonstrates the core neuro-symbolic insight: symbolic constraints improve neural predictions while keeping hard rules non-negotiable.
A5Categorical Deep Learning for Drug Discovery KGsAdvancedAlgebra II / Category Theory, Intro MLModel the drug discovery knowledge graph as a category (biomedical entities as objects, relationships as morphisms), drug-drug interactions as functors, and pharmacological constraints as natural transformations. Prove compositionality: under what categorical conditions is transitivity of drug safety guaranteed? Integrate the Scallop differentiable layer and replace Onco-TTT's heuristic graph diffusion. Lean4 formalization is a stretch goal. Thesis-level; targets NeurIPS, ICML, or ACT 2026.
A10Formally Verified Dual-Process Clinical ReasoningAdvancedTheory of Computation, Symbolic Logic, Intro MLExtend I5 to a production-grade system: Logical Neural Networks (learnable rule weights) layered over Lean4-verified hard safety rules (e.g., "never give immunotherapy to a patient with active autoimmune disease"). Formally prove that hard rules cannot be overridden by any neural input. Integrate into Onco-Shikshak and evaluate with oncology residents on 100 vignettes. Targets JAMIA, AAAI, or Nature Medicine.

Key References

  1. Li, Z., et al. (2023). Scallop: A Language for Neurosymbolic Programming. PLDI. github.com/scallop-lang/scallop
  2. Gavranovic, B., et al. (2024). Categorical Deep Learning: An Algebraic Theory of Architectures. arXiv:2402.15332. (Foundation for Symbolica, $30M funding.)
  3. Logical Neural Networks (2025). First-order logic rules with learnable weights for clinical diagnosis. AMIA Annual Symposium. PMC12150699.
  4. Rodman, A. (2025). Towards a Medical Superintelligence. Massachusetts Medical Society. (LLMs score 10/10 vs 9 for attendings on r-IDEA; automation bias paradox.)
  5. Fong, B. & Spivak, D. I. (2019). An Invitation to Applied Category Theory: Seven Sketches in Compositionality. Cambridge University Press. arXiv:1803.05316 (free PDF)
  6. Wishart, D. S., et al. (2018). DrugBank 5.0. Nucleic Acids Research, 46(D1), D1074–D1082. Primary data source for I6 and A5.
  7. Croskerry, P. (2009). A Universal Model of Diagnostic Reasoning. Academic Medicine, 84(8), 1022–1028. Foundational dual-process theory paper.
  8. Kahneman, D. (2011). Thinking, Fast and Slow. Farrar, Straus and Giroux. Canonical System 1 / System 2 framework underlying I5 and A10.

Interested in a neuro-symbolic project?

Email me at my Ashoka address with subject "IML: [I5 / I6 / A5 / A10]".
I5 and I6 are good entry points; A5 and A10 suit thesis students.

← Back to all 40 projects

Last updated March 2026. Full project specs available on request.