Longtao Zheng

Longtao Zheng 郑龙韬

Longtao Zheng is a Researcher at ByteDance, leading RL for LLM agents. He was a PhD student at Nanyang Technological University (NTU) Singapore, advised by Prof. Bo An. Previously, he obtained his Bachelor's degree in computer science from University of Science and Technology of China (USTC) in 2022.

We're hiring! Our research team is growing aggressively. We offer competitive compensation, excellent research environments, and massive compute based in Singapore / Beijing / Shanghai. Contact: longtao dot zheng at bytedance dot com.
Topics: LLM/RL Agents Multi-Agent RL Video Generation (*/†: equal contribution)
Dr. MAS
Dr. MAS: Stable Reinforcement Learning for Multi-Agent LLM Systems
Lang Feng, Longtao Zheng, Shuo He, Fuxiang Zhang, Bo An
Preprint | Paper Code
Stable training algorithm and open-source codebase for multi-agent LLM RL
OTB
The Optimal Token Baseline: Variance Reduction for Long-Horizon LLM-RL
Yingru Li, Jiawei Xu, Ziniu Li, Jiacai Liu, Wei Liu, Yuxuan Tong, Longtao Zheng, Zhenghai Xue, Yaxiang Zhang, Tianle Cai, Ge Zhang, Qian Liu, Baoxiang Wang
Preprint | Paper Code (verl)
A token-level baseline prevents RL training collapse and reduces token consumption
Dr. Kernel
Dr. Kernel: Reinforcement Learning Done Right for Triton Kernel Generations
Wei Liu, Jiawei Xu, Yingru Li, Longtao Zheng, Tianjian Li, Qian Liu, Junxian He
Preprint | Paper Code
Optimizing Triton kernel generation with multi-turn RL and test-time scaling
SimpleTIR
SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning
Zhenghai Xue*, Longtao Zheng*, Qian Liu, Yingru Li, Zejun Ma, Bo An
ICLR 2026 top 1% score | Paper Code
Simple trajectory filtering stabilizes multi-turn RL and emerges diverse reasoning
CoSo
Towards Efficient Online Tuning of VLM Agents via Counterfactual Soft Reinforcement Learning
Lang Feng, Weihao Tan, Zhiyi Lyu, Longtao Zheng, Haiyang Xu, Ming Yan, Fei Huang, Bo An
Fine-tuning VLM agents with online RL
Cradle: Empowering Foundation Agents Towards General Computer Control
Cradle Team (Longtao Zheng as core contributor)
An agent that can play AAA video games
MEMO
MEMO: Memory-Guided Diffusion for Expressive Talking Video Generation
Longtao Zheng*, Yifan Zhang*, Hanzhong Guo, Jiachun Pan, Zhenxiong Tan, Jiahao Lu, Chuanxin Tang, Bo An, Shuicheng Yan
TMLR J2C Certification | Paper Project Code Model
A SOTA and open-weight model for audio-driven talking video generation
AgentStudio
AgentStudio: A Toolkit for Building General Virtual Agents
Longtao Zheng*, Zhiyuan Huang*, Zhenghai Xue, Xinrun Wang, Bo An, Shuicheng Yan
A trinity of environments, tools, and benchmarks for general virtual agents
FinAgent
A Multimodal Foundation Agent for Financial Trading: Tool-Augmented, Diversified, and Generalist
Wentao Zhang, Lingxuan Zhao, Haochong Xia, Shuo Sun, Jiaze Sun, Molei Qin, Xinyi Li, Yuqing Zhao, Yilei Zhao, Xinyu Cai, Longtao Zheng, Xinrun Wang, Bo An
The first multimodal agent for financial trading
Synapse: Trajectory-as-Exemplar Prompting with Memory for Computer Control
Longtao Zheng, Rundong Wang, Xinrun Wang, Bo An
One of the earliest web agents with state abstraction, trajectory prompting, and memory
True Knowledge Comes from Practice: Aligning Large Language Models with Embodied Environments via Reinforcement Learning
Weihao Tan, Wentao Zhang, Shanqi Liu, Longtao Zheng, Xinrun Wang, Bo An
Fine-tuning LLM agents with online RL
Causal AHT
Controlling Type Confounding in Ad Hoc Teamwork with Instance-wise Teammate Feedback Rectification
Dong Xing, Pengjie Gu, Qian Zheng, Xinrun Wang, Shanqi Liu, Longtao Zheng, Bo An, Gang Pan
A causality-based solution to deal with type confounding in ad hoc teamwork
MAGENTA
Multi-Agent Multi-Game Entity Transformer: Towards Generalist Models in MARL
Rundong Wang, Weixuan Wang, Xianhan Zeng, Liang Wang, Zhengjie Lian, Yiming Gao, Feiyu Liu, Siqin Li, Xianliang Wang, Qiang Fu, Wei Yang, Lanxiao Huang, Longtao Zheng, Zinovi Rabinovich, Bo An
DAI 2024 Best Paper | Paper
A generalist transformer for Honor of Kings, Starcraft II, and Neural MMO
Towards Skilled Population Curriculum for Multi-Agent Reinforcement Learning
Rundong Wang*, Longtao Zheng*, Wei Qiu, Bowei He, Bo An, Zinovi Rabinovich, Yujing Hu, Yingfeng Chen, Tangjie Lv, Changjie Fan
Preprint | Paper Code
Autocurricula for MARL in complex sparse-reward environments like Google Football