I build learning systems that don't assume perfect information. My core work is on cooperative multi-agent RL under partial observability: implicit coordination via learned belief representations, decentralized policies, and the optimization pathologies that show up when you train them at scale. I care about methods that are principled, reproducible, and useful in real pipelines.
The application I care most about is autonomous driving. Self-driving fleets are Dec-POMDPs in the wild: every car has a partial view, agents can't share weights at execution time, and the cost of miscoordination is real. I want the methods I build to hold up in that setting.
Currently: Under review at NeurIPS 2026: a study of why auxiliary losses with non-stationary targets destabilize cooperative MARL training, and the architectural fixes that recover it (290+ runs across MARL and CIFAR-100). In parallel, developing AwareGate, a learned communication-gating policy where agents decide when (not just what) to communicate, targeting ICLR 2027.
Earlier work: spatio-temporal learning for environmental retrievals with NSF AI2ES and NASA GeoCARB (transformer-based retrievals, methane hotspot detection from satellite observations).
Current Affiliation
- University of Oklahoma, PhD Candidate, Computer Science
- Advisor: Dr. Mohammed Atiquzzaman