~
YZ

Yu Ze

LLM Post-Training Engineer · Alibaba Quark Search

Hangzhou / Beijing · China · 中文版本 →

I work on LLM post-training. SFT and RLHF for both text-only and multimodal models, end to end — data pipelines, training, eval, and serving. Currently at Alibaba Quark Search, applying post-training to a real-world search product. Previously at ByteDance on multimodal LLM pretraining.

This site is my notebook, project archive, and engineering judgement log. For a structured one-pager, see CV; for what I am shipping right now, see /now.

Experience

  1. 2024 — present
    Alibaba· Quark Search
    LLM Post-Training Engineer
    • Own the post-training stack (SFT + RLHF) for text and multimodal models that ship into Quark Search.
    • TODO: a concrete ownership line with numbers — e.g. "owner of the SFT data pipeline producing ~Xk high-quality samples / month, +Y pp on internal benchmark Z".
    • TODO: a business-facing metric, anonymized as needed — e.g. "lifted answer-generation quality on user metric W by Z%".
  2. 2022 — 2024
    ByteDance· Multimodal LLMs
    Multimodal LLM Pretraining
    • Worked on multimodal LLM pretraining: data pipeline, training strategy, scaling-behaviour analysis.
    • TODO: a fact about scale — tokens, GPU·hours, parameter count.
    • TODO: a sub-system you owned end-to-end.
  3. 2015 — 2019
    Beihang University (BUAA)· Undergrad
    TODO: school / major
    • TODO: one line you want public — coursework, awards, papers.

Now

  • Building the multimodal post-training stack (SFT + RLHF) for Quark Search.
  • Writing about reward-model failure modes, SFT data quality, and vLLM serving tradeoffs.
  • TODO: third thing you are actively shipping or studying.

Stack

Python · PyTorchTransformers · TRLMegatron-LM / DeepSpeedvLLM · SGLangRLHF (PPO / DPO / GRPO)SFT data pipelinesEval pipelines · LLM-as-judgeMultimodal alignment

Contact