Yao Luo (罗尧) is a LLM researcher at the ByteDance Seed Team. She received her Master’s degree from the National University of Singapore and her Bachelor’s degree from the Beijing University of Posts and Telecommunications.

She is dedicated to developing efficient and powerful architectures of foundation model, including large language models and large multimodal models. To this end, her research interests include sparse architectures, linear attention, long-context modeling, and model merging, among others.

We are recruiting LLM interns. If you’re interested in LLM efficiency or sparse/linear attention, feel free to contact me via Email luoyao0323@163.com or WeChat 15600307011.

🔥 News

2025.05: 🎉 We offer some pre-training guidelines for effective model merging!
2025.04: 🎉 We release Seed1.5-Thinking, a powerful reasoning model!
2025.01: 🎉 2 papers about long-context modeling are accepted by ICLR 2025!

📝 Publications

Bytedance Seed

Model Merging in Pre-training of Large Language Models

ByteDance Seed

We present a comprehensive investigation of model merging techniques during the pre-training process.
Merging checkpoints trained with constant learning rates not only achieves significant performance improvements but also enables accurate prediction of annealing behavior.

Bytedance Seed

Seed1.5-Thinking: Advancing Superb Reasoning Models with Reinforcement Learning

ByteDance Seed

Seed1.5-Thinking has achieved strong performance in both reasoning and non-reasoning tasks.
Seed1.5-Thinking is a Mixture-of-Experts (MoE) model with a relatively small size, featuring 20B activated and 200B total parameters.

ICLR 2025

FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference

Xunhao Lai, Jianqiao Lu, Yao Luo, Yiyuan Ma, Xun Zhou

FlexPrefill is a dynamic and context-aware sparse attention mechanism that optimizes computational efficiency during long-sequence inference for LLMs.
FlexPrefill can dynamically adjust sparse attention patterns and computational budgets in real-time based on input demands and attention head requirements.

ICLR 2025

Why Does the Effective Context Length of LLMs Fall Short?

Chenxin An, Jun Zhang, Ming Zhong, Lei Li, Shansan Gong, Yao Luo, Jingjing Xu, Lingpeng Kong

We find there is a left-skewed position frequency distribution in LLM pretraining and these infrequent positions cannot effectively model long-range dependencies.
We propose STRING, a training-free method that does not require further training but brings significant improvements to popular RoPE-based LLMs.

📖 Experience

2021.10 - now, researcher at ByteDance.
2020.09 - 2021.07: Master’s degree. National University of Singapore.
2019.08 - 2019.11: Research intern. National University of Singapore.
2016.09 - 2020.06: Bachelor’s degree. Beijing University of Posts and Telecommunications.