Open Problems · Spring 2026

40 Open Research Problems
at the Frontier of ML + Science

Protein design. Drug discovery. Clinical AI. Formal mathematics.
All unsolved. All within reach.

Interested? I can supervise ML/computational aspects. For domain expertise, I'll connect you with the right co-mentor — Prof. Sandeep Juneja (CDLDS) for probability & optimization, biology faculty for wet-lab work, or other math/CS faculty. Email me at my Ashoka address with subject "IML: [Project ID]". All levels welcome — undergrad, master's, PhD.

Seven Research Domains

B · Beginner
Y2 · 1 sem · 12 projects
I · Intermediate
Y3 · 1-2 sem · 15 projects
A · Advanced
Y4/Grad · 2 sem · 13 projects

Every project assumes basic Python + AI coding assistant. Projects tagged Juneja align with the Intro to ML course.

Beginner Projects

Y2 students · 1 semester · many Juneja-compatible
IDProjectDomainKey PrereqsTags
B1Drug Repurposing for NAFLD/NASHDrug DiscoveryP&SJuneja
B2Cancer Drug Interaction Checker with Knowledge GraphsDrug DiscoveryDiscrete Math
B3Antibody Sequence Similarity Search with Edit DistanceProtein DesignDSA, P&SJuneja
B4Visualizing Protein Structure Prediction ConfidenceProtein DesignLinear Algebra
B5Monte Carlo Estimation of Protein Design Hit RatesProtein DesignP&SJuneja
B6Classifying FDA Drug Labels with NLPClinical AIIntro CS
B7Medical Image Segmentation with Pre-trained ModelsMedical ImagingLinear Algebra
B8Sequential Experimental Design for Antibody CampaignsProtein DesignP&SJuneja
B9Molecular Property Prediction with TorchDrugDrug DiscoveryP&SJuneja
B10Simple Clinical Decision Support Rule EngineClinical AIIntro CS
B11Biomarker Identification for CVD Using Random ForestsClinical AIP&SJuneja
B12EDA of Antibody Design Campaign ResultsProtein DesignP&SJuneja

Intermediate Projects

Y3 students · 1-2 semesters · workshop paper potential
IDProjectDomainKey PrereqsOrigin
I1Multi-Objective Optimization for Antibody SpecificityProtein DesignReal Analysis, P&SAbhimanyu D1
I2Metric Geometry of CDR Sequence SpacesProtein DesignMetric SpacesAbhimanyu D2
I3Bayesian Optimization for Campaign Resource AllocationProtein DesignP&S, Intro MLAbhimanyu D5
I4Persistent Homology of Protein Binding InterfacesProtein Design / TDAAlgebra IAbhimanyu D6
I5Neuro-Symbolic Clinical Reasoning PrototypeClinical AIIntro ML, LogicIshaan D3
I6Drug Discovery KG with Differentiable ReasoningDrug DiscoveryIntro ML, DSAIshaan D4
I7Wasserstein Distances Between Antibody RepertoiresProtein Design / OTReal Analysis, P&SNew
I8GNN-Based Inverse Folding AnalysisProtein DesignIntro ML, Linear AlgebraNew
I9Information-Theoretic Scoring for Protein DesignProtein DesignP&S, Intro MLNew
I10Monte Carlo Tree Search for Binder DesignProtein DesignP&S, DSAJuneja
I11ML-Driven Biomarker Discovery for NASH/NAFLDDrug DiscoveryP&S, Intro MLNew
I12Convergence Analysis of Simplex OptimizationOptimizationReal Analysis, Linear AlgebraNew
I13AI-Augmented Tumor Board ExtensionMedical EducationIntro MLOnco-Shikshak
I14Graph Diffusion for Oncology Hypothesis GenerationDrug DiscoveryIntro ML, DSAOnco-TTT
I15Protein Design Competition Entry PipelineProtein DesignIntro MLrfab-harness

Advanced Projects

Y4 / thesis / grad students · 2 semesters · publication-track
IDProjectDomainKey PrereqsOrigin
A1GRPO Theory for Protein Diffusion ModelsProtein DesignMeasure Theory, Stat. InferenceAbhimanyu D3
A2Exchangeable Arrays for PPI PredictionProtein DesignMeasure Theory, Random GraphsAbhimanyu D4
A3Formal Verification of Protein Designs via Lean4Formal MathTheory of Computation, Algebra IIIshaan D1
A4Proteins-as-Programs: Kolmogorov ComplexityFormal MathTheory of ComputationIshaan D2
A5Categorical Deep Learning for Drug Discovery KGsNeuro-SymbolicAlgebra II, Category TheoryIshaan D4
A6AI-Augmented Math Discovery for Biomedical ConjecturesFormal MathAlgebra II, Lean4Ishaan D5
A7Diffusion Models on SE(3) for Backbone GenerationProtein DesignDifferential Geometry, DiffEqNew
A8Geometric DL Expressivity for Protein GraphsProtein DesignAlgebra I, Linear AlgebraNew
A9Optimal Transport for Generative Model EvaluationMath FoundationsMeasure Theory, Functional AnalysisNew
A10Formally Verified Dual-Process Clinical ReasoningClinical AILogic, Theory of ComputationIshaan D3
A11Multi-Parameter Persistence for Protein Design SpacesMath FoundationsAlgebra II, Algebraic TopologyNew
A12Evolutionary Game Theory of Drug ResistanceMath FoundationsDiffEq, Game TheoryNew
A13RL with Experimental Feedback for Protein DesignProtein DesignStat. Inference, Intro MLNew

Suggested Paths by Major

CS Major B6 or B10 I5 or I8 A3 or A10
Math Major B5 or B3 I2 or I7 A1 or A9
Bio + CS B1 or B9 I11 or I14 A12 or A13
Juneja Course B5 or B8 I3 or I10 A1 or A13

Getting Started

  • Tools: Protenix, Boltz-2, RFAntibody, DeepPurpose, TorchDrug, GUDHI, LeanDojo — all open-source
  • Textbooks: Boyd & Vandenberghe (Convex Optimization), Garnett (Bayesian Optimization), Bronstein (Geometric DL) — all free online
  • Competitions: Adaptyv Bio monthly challenges, BioML benchmark, ICLR GEM workshop
  • Communities: Lean4 Zulip, Adaptyv Discord, Ashoka KCDHA reading groups

Don't see your problem? Propose one.

If you have your own research question at the intersection of ML and science,
reach out — we'll find the right mentor combination.

Last updated March 2026. Full project details available on request.