YZ
I work on LLM post-training. SFT and RLHF for both text-only and multimodal models, end to end — data pipelines, training, eval, and serving. Currently at Alibaba Quark Search, applying post-training to a real-world search product. Previously at ByteDance on multimodal LLM pretraining.
This site is my notebook, project archive, and engineering judgement log. For a structured one-pager, see CV; for what I am shipping right now, see /now.
Experience
- 2024 — presentAlibaba· Quark SearchLLM Post-Training Engineer
- Own the post-training stack (SFT + RLHF) for text and multimodal models that ship into Quark Search.
- TODO: a concrete ownership line with numbers — e.g. "owner of the SFT data pipeline producing ~Xk high-quality samples / month, +Y pp on internal benchmark Z".
- TODO: a business-facing metric, anonymized as needed — e.g. "lifted answer-generation quality on user metric W by Z%".
- 2022 — 2024ByteDance· Multimodal LLMsMultimodal LLM Pretraining
- Worked on multimodal LLM pretraining: data pipeline, training strategy, scaling-behaviour analysis.
- TODO: a fact about scale — tokens, GPU·hours, parameter count.
- TODO: a sub-system you owned end-to-end.
- 2015 — 2019Beihang University (BUAA)· UndergradTODO: school / major
- TODO: one line you want public — coursework, awards, papers.
Now
- Building the multimodal post-training stack (SFT + RLHF) for Quark Search.
- Writing about reward-model failure modes, SFT data quality, and vLLM serving tradeoffs.
- TODO: third thing you are actively shipping or studying.
Stack
Python · PyTorchTransformers · TRLMegatron-LM / DeepSpeedvLLM · SGLangRLHF (PPO / DPO / GRPO)SFT data pipelinesEval pipelines · LLM-as-judgeMultimodal alignment
Contact
- EmailTODO@example.com
- GitHubgithub.com/houpanpan
- LinkedInlinkedin.com/in/TODO