Yu Ze

LLM Post-Training Engineer · Alibaba Quark Search

Hangzhou / Beijing · China · 中文版本 →

I work on LLM post-training. SFT and RLHF for both text-only and multimodal models, end to end — data pipelines, training, eval, and serving. Currently at Alibaba Quark Search, applying post-training to a real-world search product. Previously at ByteDance on multimodal LLM pretraining.

This site is my notebook, project archive, and engineering judgement log. For a structured one-pager, see CV; for what I am shipping right now, see /now.

Experience

2024 — present
Alibaba· Quark Search
LLM Post-Training Engineer
- Own the post-training stack (SFT + RLHF) for text and multimodal models that ship into Quark Search.
- TODO: a concrete ownership line with numbers — e.g. "owner of the SFT data pipeline producing ~Xk high-quality samples / month, +Y pp on internal benchmark Z".
- TODO: a business-facing metric, anonymized as needed — e.g. "lifted answer-generation quality on user metric W by Z%".
2022 — 2024
ByteDance· Multimodal LLMs
Multimodal LLM Pretraining
- Worked on multimodal LLM pretraining: data pipeline, training strategy, scaling-behaviour analysis.
- TODO: a fact about scale — tokens, GPU·hours, parameter count.
- TODO: a sub-system you owned end-to-end.
2015 — 2019
Beihang University (BUAA)· Undergrad
TODO: school / major
- TODO: one line you want public — coursework, awards, papers.

Now

Building the multimodal post-training stack (SFT + RLHF) for Quark Search.
Writing about reward-model failure modes, SFT data quality, and vLLM serving tradeoffs.
TODO: third thing you are actively shipping or studying.

Stack

Python · PyTorchTransformers · TRLMegatron-LM / DeepSpeedvLLM · SGLangRLHF (PPO / DPO / GRPO)SFT data pipelinesEval pipelines · LLM-as-judgeMultimodal alignment

Contact

View CV →中文版本