40 Open Research Problems
at the Frontier of ML + Science
Protein design. Drug discovery. Clinical AI. Formal mathematics.
All unsolved. All within reach.
Interested? I can supervise ML/computational aspects. For domain expertise, I'll connect you with the right co-mentor — Prof. Sandeep Juneja (CDLDS) for probability & optimization, biology faculty for wet-lab work, or other math/CS faculty. Email me at my Ashoka address with subject "IML: [Project ID]". All levels welcome — undergrad, master's, PhD.
Seven Research Domains
Antibody Design
Discovery
Med Education
& Lean4
Symbolic AI
Imaging
Foundations
Idea Here
Y2 · 1 sem · 12 projects
Y3 · 1-2 sem · 15 projects
Y4/Grad · 2 sem · 13 projects
Every project assumes basic Python + AI coding assistant. Projects tagged Juneja align with the Intro to ML course.
Beginner Projects
Y2 students · 1 semester · many Juneja-compatible| ID | Project | Domain | Key Prereqs | Tags |
|---|---|---|---|---|
| B1 | Drug Repurposing for NAFLD/NASH | Drug Discovery | P&S | Juneja |
| B2 | Cancer Drug Interaction Checker with Knowledge Graphs | Drug Discovery | Discrete Math | |
| B3 | Antibody Sequence Similarity Search with Edit Distance | Protein Design | DSA, P&S | Juneja |
| B4 | Visualizing Protein Structure Prediction Confidence | Protein Design | Linear Algebra | |
| B5 | Monte Carlo Estimation of Protein Design Hit Rates | Protein Design | P&S | Juneja |
| B6 | Classifying FDA Drug Labels with NLP | Clinical AI | Intro CS | |
| B7 | Medical Image Segmentation with Pre-trained Models | Medical Imaging | Linear Algebra | |
| B8 | Sequential Experimental Design for Antibody Campaigns | Protein Design | P&S | Juneja |
| B9 | Molecular Property Prediction with TorchDrug | Drug Discovery | P&S | Juneja |
| B10 | Simple Clinical Decision Support Rule Engine | Clinical AI | Intro CS | |
| B11 | Biomarker Identification for CVD Using Random Forests | Clinical AI | P&S | Juneja |
| B12 | EDA of Antibody Design Campaign Results | Protein Design | P&S | Juneja |
Intermediate Projects
Y3 students · 1-2 semesters · workshop paper potential| ID | Project | Domain | Key Prereqs | Origin |
|---|---|---|---|---|
| I1 | Multi-Objective Optimization for Antibody Specificity | Protein Design | Real Analysis, P&S | Abhimanyu D1 |
| I2 | Metric Geometry of CDR Sequence Spaces | Protein Design | Metric Spaces | Abhimanyu D2 |
| I3 | Bayesian Optimization for Campaign Resource Allocation | Protein Design | P&S, Intro ML | Abhimanyu D5 |
| I4 | Persistent Homology of Protein Binding Interfaces | Protein Design / TDA | Algebra I | Abhimanyu D6 |
| I5 | Neuro-Symbolic Clinical Reasoning Prototype | Clinical AI | Intro ML, Logic | Ishaan D3 |
| I6 | Drug Discovery KG with Differentiable Reasoning | Drug Discovery | Intro ML, DSA | Ishaan D4 |
| I7 | Wasserstein Distances Between Antibody Repertoires | Protein Design / OT | Real Analysis, P&S | New |
| I8 | GNN-Based Inverse Folding Analysis | Protein Design | Intro ML, Linear Algebra | New |
| I9 | Information-Theoretic Scoring for Protein Design | Protein Design | P&S, Intro ML | New |
| I10 | Monte Carlo Tree Search for Binder Design | Protein Design | P&S, DSA | Juneja |
| I11 | ML-Driven Biomarker Discovery for NASH/NAFLD | Drug Discovery | P&S, Intro ML | New |
| I12 | Convergence Analysis of Simplex Optimization | Optimization | Real Analysis, Linear Algebra | New |
| I13 | AI-Augmented Tumor Board Extension | Medical Education | Intro ML | Onco-Shikshak |
| I14 | Graph Diffusion for Oncology Hypothesis Generation | Drug Discovery | Intro ML, DSA | Onco-TTT |
| I15 | Protein Design Competition Entry Pipeline | Protein Design | Intro ML | rfab-harness |
Advanced Projects
Y4 / thesis / grad students · 2 semesters · publication-track| ID | Project | Domain | Key Prereqs | Origin |
|---|---|---|---|---|
| A1 | GRPO Theory for Protein Diffusion Models | Protein Design | Measure Theory, Stat. Inference | Abhimanyu D3 |
| A2 | Exchangeable Arrays for PPI Prediction | Protein Design | Measure Theory, Random Graphs | Abhimanyu D4 |
| A3 | Formal Verification of Protein Designs via Lean4 | Formal Math | Theory of Computation, Algebra II | Ishaan D1 |
| A4 | Proteins-as-Programs: Kolmogorov Complexity | Formal Math | Theory of Computation | Ishaan D2 |
| A5 | Categorical Deep Learning for Drug Discovery KGs | Neuro-Symbolic | Algebra II, Category Theory | Ishaan D4 |
| A6 | AI-Augmented Math Discovery for Biomedical Conjectures | Formal Math | Algebra II, Lean4 | Ishaan D5 |
| A7 | Diffusion Models on SE(3) for Backbone Generation | Protein Design | Differential Geometry, DiffEq | New |
| A8 | Geometric DL Expressivity for Protein Graphs | Protein Design | Algebra I, Linear Algebra | New |
| A9 | Optimal Transport for Generative Model Evaluation | Math Foundations | Measure Theory, Functional Analysis | New |
| A10 | Formally Verified Dual-Process Clinical Reasoning | Clinical AI | Logic, Theory of Computation | Ishaan D3 |
| A11 | Multi-Parameter Persistence for Protein Design Spaces | Math Foundations | Algebra II, Algebraic Topology | New |
| A12 | Evolutionary Game Theory of Drug Resistance | Math Foundations | DiffEq, Game Theory | New |
| A13 | RL with Experimental Feedback for Protein Design | Protein Design | Stat. Inference, Intro ML | New |
Suggested Paths by Major
Getting Started
- Tools: Protenix, Boltz-2, RFAntibody, DeepPurpose, TorchDrug, GUDHI, LeanDojo — all open-source
- Textbooks: Boyd & Vandenberghe (Convex Optimization), Garnett (Bayesian Optimization), Bronstein (Geometric DL) — all free online
- Competitions: Adaptyv Bio monthly challenges, BioML benchmark, ICLR GEM workshop
- Communities: Lean4 Zulip, Adaptyv Discord, Ashoka KCDHA reading groups
Don't see your problem? Propose one.
If you have your own research question at the intersection of ML and science,
reach out — we'll find the right mentor combination.
Last updated March 2026. Full project details available on request.