Vishnu Kadiyala

About

I build learning systems that don't assume perfect information. My core work is on cooperative multi-agent RL under partial observability: implicit coordination via learned belief representations, decentralized policies, and the optimization pathologies that show up when you train them at scale. I care about methods that are principled, reproducible, and useful in real pipelines.

The application I care most about is autonomous driving. Self-driving fleets are Dec-POMDPs in the wild: every car has a partial view, agents can't share weights at execution time, and the cost of miscoordination is real. I want the methods I build to hold up in that setting.

Currently: Under review at NeurIPS 2026: a study of why auxiliary losses with non-stationary targets destabilize cooperative MARL training, and the architectural fixes that recover it (290+ runs across MARL and CIFAR-100). In parallel, developing AwareGate, a learned communication-gating policy where agents decide when (not just what) to communicate, targeting ICLR 2027.

Earlier work: spatio-temporal learning for environmental retrievals with NSF AI2ES and NASA GeoCARB (transformer-based retrievals, methane hotspot detection from satellite observations).

Current Affiliation

University of Oklahoma, PhD Candidate, Computer Science
Advisor: Dr. Mohammed Atiquzzaman

Education

University of Oklahoma

Ph.D. in Computer Science

Expected May 2027

University of Oklahoma

M.S. in Electrical and Computer Engineering

May 2022

KLE Technological University

B.E. in Electronics and Communication Engineering

May 2019

Research

Three threads run through my work: coordination without a shared brain, what makes that training stable, and whether it survives contact with the road.

01

Implicit Coordination via Latent Belief Representations

Cooperative multi-agent systems often can't rely on explicit communication: bandwidth is limited, channels are noisy, and at execution time agents typically can't share weights. I work on learning compact latent belief representations that let agents coordinate anyway, using attention-based belief updates over local observation histories and decentralized policies that condition on those beliefs. The goal is coordination that survives the gap between training and deployment.

02

Learning Dynamics in Cooperative MARL

Methods that look principled on paper can still fail to train. My NeurIPS 2026 submission characterizes one such failure mode: auxiliary losses with non-stationary targets inject directional gradient noise into the shared trunk, destabilizing training across both MARL and supervised settings. I'm interested in the broader question of which architectural and optimization choices make cooperative MARL train stably at scale, and which silently break it.

03

Multi-Agent Decision-Making for Autonomous Driving

Self-driving fleets are Dec-POMDPs in the wild: every vehicle has a partial view, agents can't share weights at execution time, and miscoordination has real-world cost. I'm extending the belief-representation and learning-dynamics work to multi-agent driving prediction (Waymax, Waymo Open Motion Dataset) and bandwidth-constrained V2X communication. The question is whether methods that work in MPE and SMAX hold up behind the wheel.

Publications

Preprints & Working Papers

V. P. Kadiyala “When Auxiliary Losses Fail: Non-Stationary Targets Induce Directional Gradient Noise.” Under review at NeurIPS 2026.
V. P. Kadiyala, M. Atiquzzaman “AwareGate: Counterfactual-Baseline Selective Communication in Cooperative Multi-Agent Reinforcement Learning.” In preparation.
V. P. Kadiyala, M. Atiquzzaman “Learning When to Communicate: Bandwidth-Efficient Cooperative Driving via Gated Learned V2X Messages.” In preparation, IEEE Vehicular Technology Conference (VTC) 2026.

Peer-Reviewed Conference Papers

M. X. Sasser, M. Wilson Reyes, V. P. Kadiyala, A. Kurbanovas, K. J. Sulia, et al. “Estimating Statewide Atmospheric Visibility From Camera Images.” Proceedings of the 105th Annual American Meteorological Society (AMS) Meeting, 2025.
E. Spicer, S. Crowell, F. Xu, V. P. Kadiyala, P. M. Klein, et al. “Exploring the Influence of Local Urban and Industrial Carbon-Based Pollutant Sources on Total Column Concentration Enhancements in Houston, Texas during TRACER.” Proceedings of the 104th AMS Annual Meeting, 2024.
V. Kadiyala, V. Hulyalkar “Wireless Video Transmission over a Frequency of 2.4 GHz.” International Conference on New Trends in Engineering & Technology (ICNTET), 2018.
V. P. Kadiyala, et al. “Design and Implementation of Plant Growth Monitoring System Using Infrared Radiation.” 3rd International Conference on Electrical, Electronics, Communication, Computer Technologies and Optimization Techniques (ICEECCOT), 2018.

Conference Abstracts

E. Spicer, S. Crowell, F. Xu, N. Krishnakutty, V. P. Kadiyala, et al. “Urban and Industrial Carbon-Based Pollutant Monitoring Using EM27/SUNs in Houston, Texas During the Summer 2022 GeoCarb-TRACER Campaign.” American Meteorological Society Meeting Abstracts, 2023.

Theses

V. P. Kadiyala “Localization of Tables and Plots in Documents Using Deep Neural Networks.” Master’s Thesis, University of Oklahoma, 2022.

Selected Projects

NeurIPS 2026 (under review)

A learning-dynamics account of why auxiliary losses sometimes corrupt late training

Auxiliary prediction losses usually stabilize representation learning, but sometimes quietly degrade late-training performance in a way that looks like ordinary RL noise. This work identifies the mechanism: when an auxiliary target is both structured (coupled to the task state) and non-stationary (drifting as other agents learn), it injects directional gradient noise whose contribution to parameter variance dominates the vanishing policy gradient near convergence.

d = +1.40co-learning aux vs. none (CI above zero)
290+runs · 5 seeds · bootstrap CIs
8×aux-capacity sweep, no effect
3 fixesrecover the no-aux regime

Multi-Agent RLLearning DynamicsAuxiliary LossesPPONeurIPS 2026

In progress · Waymax

Applying VABL's belief-encoder architecture to multi-agent prediction in the Waymo Open Motion Dataset via Waymax. Three-variant comparison (full belief encoder, ablated attention, and a baseline) designed to test whether the gradient-interference pathology characterized in the NeurIPS work generalizes from cooperative MARL benchmarks to real driving scenarios. Open question: does what fails in Overcooked also fail behind the wheel?

Multi-Agent RLWaymaxMotion PredictionBelief ModelingJAX

In development · Targeting ICLR 2027

A learned communication-gating policy for cooperative multi-agent systems. Agents decide when (not just what) to communicate, using cross-attention over received messages and a recurrent belief state. The thesis: in bandwidth-constrained settings, including V2X, always-on communication isn't just wasteful, it can hurt coordination. Selective gating should outperform both silent and full-broadcast baselines.

Multi-Agent RLLearned CommunicationAttention + Recurrent BeliefV2X

Transformer-based architecture for environmental data (stations + remote sensing), emphasizing spatial/temporal embeddings and attention-based fusion. Achieved 13× improvement over classical Marshall–Palmer baseline.

TransformersSpatio-TemporalEnvironmental AIAI2ES

U-Net–based deep learning model achieving 95% accuracy for methane hotspot and leak detection. Improved anomaly detection from 80% to 90.2% using diffusion-based generative models.

U-NetDiffusion ModelsRemote SensingGeoCARB

Developed a vision-based system using outdoor camera imagery for statewide atmospheric visibility inference beyond sparsely deployed ASOS stations.

Computer VisionEnvironmental AISensor Fusion

Live tool

A single-page RSVP (rapid serial visual presentation) reader for research papers. Drop a PDF, focus on the red anchor, skip the bibliography. Two-column reflow via pdf.js; optional per-section AI summaries (Claude Haiku, BYOK). No backend.

RSVPPDF ParsingClaude APIWeb Tool

Technical Expertise

Machine Learning

Multi-Agent Reinforcement LearningDeep LearningTransformersAttention MechanismsCNNsDiffusion Models

Programming Languages

PythonCC++

Frameworks

PyTorchJAX / FlaxTensorFlow

Simulation Environments

MPESMAXHighway-EnvMetaDriveWaymax

Data & Systems

PandasXarrayNetCDFHigh-Performance ComputingSLURMDistributed TrainingRemote Compute Orchestration

Tools

Git / GitHubLinuxROS

Experience

Graduate Research Assistant · NSF AI2ES, University of Oklahoma

2023 – 2025

Built a transformer-based architecture for irregular spatio-temporal environmental data retrievals, achieving a 13× improvement over the classical Marshall–Palmer baseline.
Developed a vision-based atmospheric visibility estimation system for statewide inference beyond sparse ASOS sensor coverage.
Processed and integrated multi-modal datasets (satellite, radar, ground station) across HPC infrastructure with reproducible experiment pipelines.

Graduate Research Assistant · NASA GeoCARB, University of Oklahoma

2021 – 2023

Built a U-Net deep learning model achieving 95% accuracy for methane hotspot detection from satellite observations.
Improved anomaly detection from 80% to 90.2% using diffusion-based generative models.
Collaborated with Dr. Sean Crowell on methane leak detection and atmospheric science applications.

Test Automation Intern · Robert Bosch Engineering & Business Solutions

Jan – May 2019

Developed hardware-in-the-loop (HIL) test automation pipelines for Engine Control Units (ECUs).
Automated ECU software validation using ETAS LABCAR across hardware and digital fault layers.

Teaching

Teaching Assistant · CS 2614 Computer Organization

University of Oklahoma

Led weekly labs on digital logic, assembly-level reasoning, and hardware/software interface fundamentals.
Developed lab documentation and troubleshooting/debugging workflows for novice builders.

Service & Leadership

Graduate Research Mentor (Engineering Pathways): Mentoring and research guidance for undergraduate students.
Organizer/Manager: Norman Cricket Championship. Operations, scheduling, and streaming workflow coordination.

Past Collaborations

NSF AI2ES
with Dr. Andrew Fagg
2023–2025
NASA GeoCARB
with Dr. Sean Crowell
2021–2023

Contact

If you're interested in collaboration on multi-agent RL, implicit coordination, or spatio-temporal modeling, feel free to reach out.

Say Hello →

Norman, OK

About

Current Affiliation

Education

University of Oklahoma

University of Oklahoma

KLE Technological University

Research

Implicit Coordination via Latent Belief Representations

Learning Dynamics in Cooperative MARL

Multi-Agent Decision-Making for Autonomous Driving

Publications

Preprints & Working Papers

Peer-Reviewed Conference Papers

Conference Abstracts

Theses

Selected Projects

When Auxiliary Losses Fail: Non-Stationary Targets Induce Directional Gradient Noise

Belief-Encoder Architectures for Multi-Agent Driving Prediction

AwareGate: Learning When to Communicate

Transformer-Based Irregular Spatio-Temporal Retrievals

Methane Hotspot Detection from Satellite Observations

Vision-Based Atmospheric Visibility Estimation

fastreading: RSVP Reader for Research Papers

Technical Expertise

Machine Learning

Programming Languages

Frameworks

Simulation Environments

Data & Systems

Tools

Experience

Graduate Research Assistant · NSF AI2ES, University of Oklahoma

Graduate Research Assistant · NASA GeoCARB, University of Oklahoma

Test Automation Intern · Robert Bosch Engineering & Business Solutions

Teaching

Teaching Assistant · CS 2614 Computer Organization

Service & Leadership

Past Collaborations

NSF AI2ES

NASA GeoCARB

Contact