Computer Science · PhD Researcher

Vishnu Kadiyala

I'm working on Multi-agent reinforcement learning for autonomous driving and cooperative systems.

I'm a PhD candidate in Computer Science at the University of Oklahoma. My research is on Multi-Agent Reinforcement Learning under partial observability: implicit coordination via learned belief representations, decentralized policies, and the learning dynamics that make cooperative MARL stable at scale. I work on these methods because I think they're the right framing for L4 autonomy and V2X: agents acting on incomplete information, with no shared brain to fall back on.

Vishnu Kadiyala

About

I build learning systems that don't assume perfect information. My core work is on cooperative multi-agent RL under partial observability: implicit coordination via learned belief representations, decentralized policies, and the optimization pathologies that show up when you train them at scale. I care about methods that are principled, reproducible, and useful in real pipelines.

The application I care most about is autonomous driving. Self-driving fleets are Dec-POMDPs in the wild: every car has a partial view, agents can't share weights at execution time, and the cost of miscoordination is real. I want the methods I build to hold up in that setting.

Currently: Under review at NeurIPS 2026: a study of why auxiliary losses with non-stationary targets destabilize cooperative MARL training, and the architectural fixes that recover it (290+ runs across MARL and CIFAR-100). In parallel, developing AwareGate, a learned communication-gating policy where agents decide when (not just what) to communicate, targeting ICLR 2027.

Earlier work: spatio-temporal learning for environmental retrievals with NSF AI2ES and NASA GeoCARB (transformer-based retrievals, methane hotspot detection from satellite observations).

Current Affiliation

  • University of Oklahoma, PhD Candidate, Computer Science
  • Advisor: Dr. Mohammed Atiquzzaman

Education

University of Oklahoma

Ph.D. in Computer Science

Expected May 2027

University of Oklahoma

M.S. in Electrical and Computer Engineering

May 2022

KLE Technological University

B.E. in Electronics and Communication Engineering

May 2019

Research

Three threads run through my work: coordination without a shared brain, what makes that training stable, and whether it survives contact with the road.

01

Implicit Coordination via Latent Belief Representations

Cooperative multi-agent systems often can't rely on explicit communication: bandwidth is limited, channels are noisy, and at execution time agents typically can't share weights. I work on learning compact latent belief representations that let agents coordinate anyway, using attention-based belief updates over local observation histories and decentralized policies that condition on those beliefs. The goal is coordination that survives the gap between training and deployment.

02

Learning Dynamics in Cooperative MARL

Methods that look principled on paper can still fail to train. My NeurIPS 2026 submission characterizes one such failure mode: auxiliary losses with non-stationary targets inject directional gradient noise into the shared trunk, destabilizing training across both MARL and supervised settings. I'm interested in the broader question of which architectural and optimization choices make cooperative MARL train stably at scale, and which silently break it.

03

Multi-Agent Decision-Making for Autonomous Driving

Self-driving fleets are Dec-POMDPs in the wild: every vehicle has a partial view, agents can't share weights at execution time, and miscoordination has real-world cost. I'm extending the belief-representation and learning-dynamics work to multi-agent driving prediction (Waymax, Waymo Open Motion Dataset) and bandwidth-constrained V2X communication. The question is whether methods that work in MPE and SMAX hold up behind the wheel.

Publications

Preprints & Working Papers

Peer-Reviewed Conference Papers

Conference Abstracts

Theses

Selected Projects

Transformer-based architecture for environmental data (stations + remote sensing), emphasizing spatial/temporal embeddings and attention-based fusion. Achieved 13× improvement over classical Marshall–Palmer baseline.

TransformersSpatio-TemporalEnvironmental AIAI2ES

U-Net–based deep learning model achieving 95% accuracy for methane hotspot and leak detection. Improved anomaly detection from 80% to 90.2% using diffusion-based generative models.

U-NetDiffusion ModelsRemote SensingGeoCARB

Developed a vision-based system using outdoor camera imagery for statewide atmospheric visibility inference beyond sparsely deployed ASOS stations.

Computer VisionEnvironmental AISensor Fusion

Live tool

A single-page RSVP (rapid serial visual presentation) reader for research papers. Drop a PDF, focus on the red anchor, skip the bibliography. Two-column reflow via pdf.js; optional per-section AI summaries (Claude Haiku, BYOK). No backend.

RSVPPDF ParsingClaude APIWeb Tool

Technical Expertise

Machine Learning

Multi-Agent Reinforcement LearningDeep LearningTransformersAttention MechanismsCNNsDiffusion Models

Programming Languages

PythonCC++

Frameworks

PyTorchJAX / FlaxTensorFlow

Simulation Environments

MPESMAXHighway-EnvMetaDriveWaymax

Data & Systems

PandasXarrayNetCDFHigh-Performance ComputingSLURMDistributed TrainingRemote Compute Orchestration

Tools

Git / GitHubLinuxROS

Experience

Graduate Research Assistant · NSF AI2ES, University of Oklahoma

2023 – 2025
  • Built a transformer-based architecture for irregular spatio-temporal environmental data retrievals, achieving a 13× improvement over the classical Marshall–Palmer baseline.
  • Developed a vision-based atmospheric visibility estimation system for statewide inference beyond sparse ASOS sensor coverage.
  • Processed and integrated multi-modal datasets (satellite, radar, ground station) across HPC infrastructure with reproducible experiment pipelines.

Graduate Research Assistant · NASA GeoCARB, University of Oklahoma

2021 – 2023
  • Built a U-Net deep learning model achieving 95% accuracy for methane hotspot detection from satellite observations.
  • Improved anomaly detection from 80% to 90.2% using diffusion-based generative models.
  • Collaborated with Dr. Sean Crowell on methane leak detection and atmospheric science applications.

Test Automation Intern · Robert Bosch Engineering & Business Solutions

Jan – May 2019
  • Developed hardware-in-the-loop (HIL) test automation pipelines for Engine Control Units (ECUs).
  • Automated ECU software validation using ETAS LABCAR across hardware and digital fault layers.

Teaching

Teaching Assistant · CS 2614 Computer Organization

University of Oklahoma

  • Led weekly labs on digital logic, assembly-level reasoning, and hardware/software interface fundamentals.
  • Developed lab documentation and troubleshooting/debugging workflows for novice builders.

Service & Leadership

Past Collaborations

Contact

If you're interested in collaboration on multi-agent RL, implicit coordination, or spatio-temporal modeling, feel free to reach out.

Say Hello →

Norman, OK