Ming Sun (孙明)
Kuaishou researcher, From 2021.3 to now
Sensetime researcher, From 2018.10 to 2021.3
BaiDu IDL intern and researcher, From 2016.11 to 2018.10
Bytedance (TouTiao) AI Lab intern, From 2016.5 to 2016.10
Email: m_sunming@163.com

Short Bio
From 2021, I am leading a outstanding group, about 30 researcher, which evaluate and promote the quality of user generated video on kuaishou.
I worked at SenseTime with Junjie Yan from 2018.11 to 2021.3, focus on object detection and Automl.
I worked at BaiDu IDL with Feng Zhou from 2016.11 to 2018.11 and learned a lot apart from tech.
Fortunately, I was supervised by prefessor JuFeng Yang and cooperated with MingMing Cheng.
And received the Master degree from Nankai University in 2017.
I was tech leader of the group, named 搜索与决策(inspired by exploration and exploitation), which focus on Detection(face/human/traffic/structure/keypoint/video) and Automl (augmentation/samping/loss/network auto search), about 20 researcher.
I am leading a group, which focus on UGC video quality assessment (KVQ, Kuaishou Video Quality, such as noise/blocky/color/dirtylens ) and processing algorithms (KEP & KRP, Kuaishou Enhancement & Restoration Processing, such as video deblur/SR/denoise ), about 20 researcher.
For further boosting the performance of KVQ and KEP & KRP, we develop detection/classification algorithms.
AI+音视频:
国际领先且广泛使用的VQA算法: 花费了四年时间积累了海量的高质量标注数据&基于多模态大模型框架,研发了KVQ(Kuaishou Video Quality)算法。同时搜集了大量的用户反馈场景,在快手上百个场景中均超过了Golden eye表现,目前每天模型调用上亿次,出售给多家互联网大厂,获得公司洛子峰-技术突破奖。在学术上首次提出QPT等pretrain框架,研发了多个系列包括QPT-V1(解决数据问题),V2(从清晰度分到观感分),V3(接入LLM做智能白盒化),已被CVPR等接受,并举办了多次CVPR workshop比赛(腾讯音视频等互联网大厂均参赛),在学术和工业界有较大影响,多次去NV GTC分享相关技术。
显著改善画质的大模型生成算法: 处理算法可以简单分为三个时代(传统的算法/基于GAN的DL算法/基于生成的大模型算法)。在快手前三年间研发了系列传统算法和DL算法,包括去噪/去模糊/HDR/编码友好的前处理等,取得了显著的带宽和清晰度收益。最近随着生成式技术发展,处理领域还没突破该范式红利(主要集中在文字/人脸保真问题以及模型速度问题),快手是首个提出并研发出LPM(Large Preprocessing Model)的模型公司,解决了该领域生成和保真的平衡问题,并大量上线且取得了显著的AB收益。感谢快手海量高质量数据(数十亿),以及模型上系统上的彻底自研(VAE/DIT框架/时序建模/Reward model设计等针对性设计和训练)。除此之外,我们还做出了业界首个基于AR的处理模型,并验证主观指标超过diffusion,已被ICML接收。
端到端极低码率且实用的转码系统:成本一直是该领域的核心关注点,随着编码器的成熟(265/266),想进一步降低码率且保住更多画面生成细节是一个非常重要的挑战。基于此,我们除了花了大量精力攻克编码器外,还实现了大模型+编码器+端上NPU处理整体联合并面向KVQ主观指标的优化pipeline,彻底做到训练端到端,部署可分离的状态。相比原始的很多端云结合等策略,有着明显的技术优势,尤其感谢端上手机厂商的NPU算力发展,功耗和算力都有着极高性价比。相关技术原型也发表在AAAI等顶级会议上,为公司节省大量成本。
AI Infra创新和软硬联合优化: 音视频领域高度数字化,是AI落地的重要场景,其也意味着AI算法需要深度优化才能过实现算法规模效应。以LPM为列,需要加速100倍以上,才能覆盖足够多的视频,从而拉动用户的QOE和GMV等表现。快手音视频AI infra主要集中在以下关注点,(1)和NV深度合作,优化质量评价算法和生成大模型,包括但不限于DIT attention量化/剪枝/OP图优化/任务间显存共享/LLM吞吐加速等; (2)和手机厂一起优化NPU算力和功耗,包括新的算子支持/IO传输等优化; (3)软硬协同优化,快手音视频有着自己的自研芯片,如何最大化芯片算力+特殊模型设计等都有着深入的研究。
News:
- 端到端的转码链路,大幅降低码率,AI助力音视频 端到端的转码优化 .
- 业界首个视频画质大模型(基于海量的UGC视频),显著带来用户时长增长,感谢NV和超海量的数据 基于Diffusion的UGC多模处理大模型 .
- One paper, named QPT-v2, was accepeted by ACM-MM 2024.
- Three paper about VQA&Diffusion were accepeted by ECCV 2024.
- Three paper were accepeted by CVPR 2024.
- 快手质量评价算法(KVQ)获得公司《技术突破奖》,奖金50,000,KVQ能够准确的衡量UGC视频每个阶段的清晰度变化.
- 团队在NTIRE 2023的Stereo Image Super-Resolution竞赛中荣获Winner Award. 超分算法 .
- 通过大量的专家数据标注和模型改良,音视频质量领域大模型超过Golden eye. KVQ算法 .
- One paper, named QPT, was accepeted by CVPR 2023. 快手质量大模型 .
- 快手上线业界领先的端上画质超分算法,高端机AI方案+中端机ML方案. 软硬优化一体 .
- Three paper about transformer/nas were accepeted by ICCV 2021.
- One paper about Autosampling was accepeted by ICML 2021.
- Best autoaug code are available!.
- Another paper about autoaug were accepeted by NeurIPS 2020.
- Two paper about nas were accepeted by ECCV 2020.
- Two paper about nas and detection were accepeted by CVPR 2020.
- The team i lead get Dean Award (¥100,000), which is the highest research reputation of sensetime .
- Another paper about nas-detection was accepeted by ICLR 2020.
- One paper about nas-detection was accepeted by NIPS 2019.
- Another paper about detection was accepted by ICCV 2019.
- We won 1rd of Traffic Anomaly Detection and 3rd of City-Scale Multi-Camera Vehicle Tracking in CVPR 2019.
- One paper on visual emotion recognition was accepted to TOMM.
- NIPS named CGNL code are available!.
- Another paper was accepted in NIPS 2018.
- One paper accepted in ECCV 2018 with oral.
- Large scale (10,000+ classes) flower classification service can avaiable in BaiDu APP, which build on knowledge graph and hundreds of millions data.
- We won 2rd flower and 3rd inaturalist in CVPR 2017 FGVC workshop.
- Got the Best New Artist and the Outstanding Project award in BaiDu IDL.
- One paper accepted in TMM 2018.
- Ming received National Scholarship for Graduate Students (¥20,000).
- Ming was named as Excellent Graduate, and received Outstanding Dissertations Award from Nankai University.
Publication
Visual Autoregressive Modeling for Image Super-Resolution.
Yunpeng Qu, Kun Yuan, Jinhua Hao, Kai Zhao, Qizhi Xie, Ming Sun , Chao Zhou.
ICML 2025. The first SR paper based on AR modeling.
Accelerating Diffusion-based Super-Resolution with Dynamic Time-Spatial Sampling.
Rui Qin, Qijie Wang, Ming Sun , Haowei Zhu, Chao Zhou, Bin Wang.
IJCAI 2025. Traditional static time contidion used in diffusion is not enough
Rethinking Diffusion Posterior Sampling: From Conditional Score Estimator to Maximizing a Posterior.
Tongda Xu, Xiyan Cai, Xinjie Zhang, Xingtong Ge, Dailan He, Ming Sun , Jingjing Liu, Ya-Qin Zhang, Jian Li, Yan Wang.
ICLR 2025.
Boosting Video Quality Assessment via Saliency-guided Local Perception.
Yunpeng Qu, Kun Yuan, Qizhi Xie, Ming Sun , Chao Zhou, Jian Wang.
CVPR 2025.
Plug-and-Play Tri-Branch Invertible Block for Image Rescaling.
Jingwei Bao, Jinhua Hao, Pengcheng Xu, Ming Sun , Chao Zhou, Shuyuan Zhu.
AAAI 2025. How to end2end promote the quality of mobile video
QPT V2: Masked Image Modeling Advances Visual Scoring.
Qizhi Xie, Kun Yuan, Yunpeng Qu, Mingda Wu, Ming Sun , Chao Zhou, Jihong Zhu.
ACM MM 2024.
A New Dataset and Framework for Real-World Blurred Images Super-Resolution.
Rui Qin, Ming Sun , Chao Zhou, Bin Wang.
ECCV , 2024. Dataset and Deblur Code!
XPSR: Cross-modal Priors for Diffusion-based Image Super-Resolution.
Yunpeng Qu, Kun Yuan, Kai Zhao, Qizhi Xie, Jinhua Hao, Ming Sun , Chao Zhou.
ECCV , 2024.
KVQ: Kwai Video Quality Assessment for Short-form Videos.
Yiting Lu, Xin Li, Yajing Pei, Kun Yuan, Qizhi Xie, Yunpeng Qu, Ming Sun , Chao Zhou, Zhibo Chen.
CVPR , 2024, the proposed dataset are available on the Nitre 2024 competition.
CPGA: Coding Priors-Guided Aggregation Network for Compressed Video Quality Enhancement.
Qiang Zhu, Jinhua Hao, Yukang Ding, Yu Liu, Qiao Mo, Ming Sun , Chao Zhou, Shuyuan Zhu.
CVPR , 2024. Taking the codec into video restoration.
PTM-VQA: Efficient Video Quality Assessment Leveraging Diverse PreTrained Models from the Wild.
Kun Yuan, Hongbo Liu, Mading Li, Muyi Sun, Ming Sun , Jiachao Gong, Jinhua Hao, Chao Zhou, Yansong Tang.
CVPR , 2024. Inspired by VMAF which are popular.
Blind Image Super-resolution with Rich Texture-Aware Codebooks.
Rui Qin, Ming Sun , Fangyuan Zhang, Xing Wen, Bin Wang.
ACM MM , 2023, Oral . LR are not enough for BSR task.
Ada-DQA: Adaptive Diverse Quality-aware Feature Acquisition for Video Quality Assessment.
Hongbo Liu, Mingda Wu, Kun Yuan, Ming Sun , Yansong Tang, Chuanchuan Zheng, Xing Wen, Xiu Li.
ACM MM , 2023. Domain knowledge for VQA
Capturing Co-existing Distortions in User-Generated Content for No-reference Video Quality Assessment.
Kun Yuan, Zishang Kong, Chuanchuan Zheng, Ming Sun , Xing Wen.
ACM MM, 2023 , Oral . Sparse attention for VQA
Reconstructed Convolution Module Based Look-Up Tables for Efficient Image Super-Resolution.
Guandu Liu, Yukang Ding, Mading Li, Ming Sun , Xing Wen, Bin Wang.
ICCV, 2023.
Quality-aware Pre-trained Models for Blind Image Quality Assessment.
Kai Zhao, Kun Yuan, Ming Sun , Mading Li, Xing Wen.
CVPR, 2023. QPT with more data.
AutoSampling: Search for Effective Data Sampling Schedules.
Ming Sun , Haoxuan Dou, Baopu Li, Junjie Yan, Wanli Ouyang, Lei Cui.
ICML, 2021. Auto sampling firstly.
Evolving Search Space for Neural Architecture Search.
Yuanzheng Ci, Chen Lin, Ming Sun , Boyu Chen, Hongwen Zhang, Wanli Ouyang , Junjie Yan.
ICCV, 2021.
GLiT: Neural Architecture Search for Global and Local Image Transformer.
Boyu Chen, Peixia Li, Chuming Li, Baopu Li, Lei Bai, Chen Lin, Ming Sun , Junjie yan, Wanli Ouyang.
ICCV, 2021.
Inception Convolution with Efficient Dilation Search.
Jie Liu, Chuming Li, Feng Liang, Chen Lin, Ming Sun , Junjie Yan, Wanli Ouyang, Dong Xu.
CVPR, 2021, Oral . Simple conv for scale variance
IC-conv Code!
Improving Auto-Augment via Augmentation-Wise Weight Sharing.
Keyu Tian, Chen Lin, Ming Sun , Luping Zhou, Junjie Yan, Wanli Ouyang
NeurIPS, 2020. Best autoaug policy on Imagenet dataset
Autoaug Code!
Efficient Transfer Learning via Joint Adaptation of Network Architecture and Weight.
Ming Sun , Haoxuan Dou, Junjie Yan
ECCV, 2020. NAS for transfer learning first time!
Powering One-shot Topological NAS with Stabilized Share-parameter Proxy.
Ronghao Guo, Chen Lin, Chu ming Li, Keyu Tian, Ming Sun , Lu Sheng, Junjie Yan
ECCV, 2020. NAS for Topo
Large-Scale Object Detection in the Wild from Imbalanced Multi-Labels.
Junran Peng, Xingyuan Bu, Ming Sun , Junjie Yan
CVPR, 2020, Oral . Simple trick for OD
Improving One-shot NAS by Suppressing the Posterior Fading.
Xiang Li, Chen Lin, Chuming Li, Ming Sun , Wei Wu, Junjie Yan, Wanli Ouyang
CVPR, 2020.
Computation Reallocation for Object Detection.
Feng Liang, Ronghao Guo, Chen Lin, Ming Sun, Wei Wu, Junjie Yan, Wanli Ouyang
ICLR, 2020.
Efficient Neural Architecture Transformation Searchin Channel-Level for Object Detection.
Junran Peng, Ming Sun , Zhaoxiang Zhang, Tieniu Tan, Junjie Yan
NIPS, 2019.
POD: Practical Object Detection with Scale-Sensitive Network.
Junran Peng, Ming Sun , Zhaoxiang Zhang, Junjie Yan, Tieniu Tan
ICCV, 2019.
Learning discriminative sentiment representation from strongly- and weakly-supervised CNNs
Dongyu She, Ming Sun , Jufeng Yang
TOMM, 2019.
Multi-Attention Multi-Class Constraint for Fine-grained Image Recognition
Ming Sun, Yuchen Yuan, Feng Zhou, Errui Ding
ECCV, 2018, Oral .
Compact Generalized Non-local Network
Kaiyu Yue, Ming Sun, Yuchen Yuan, Errui Ding,Fuxin Xu, Feng Zhou
NIPS, 2018.
CGNL Code!
Visual Sentiment Prediction based on Automatic Discovery of Affective Regions
Jufeng Yang, Dongyu She, Ming Sun, Ming-Ming Cheng, Paul L. Rosin, Liang Wang
IEEE Transactions on Multimedia (TMM), 2018.
Learning Visual Sentiment Distributions via Augmented Conditional Probability Neural Network
Jufeng Yang, Ming Sun, Xiaoxiao Sun
AAAI Conference on Artificial Intelligence (AAAI), 2017.
Dataset is available!
Joint Image Emotion Classification and Distribution Learning via Deep Convolutional Neural Network
Jufeng Yang, Dongyu She, Ming Sun
International Joint Conference on Artificial Intelligence (IJCAI), Oral ,2017.
A Benchmark for Automatic Visual Classification of Clinical Skin Disease Images
Xiaoxiao Sun, Jufeng Yang, Ming Sun, Kai Wang
European Conference on Computer Vision (ECCV), 2016.
Project Homepage
Shape-Guided Segmentation for Fine-Grained Visual Categorization
Ming Sun, Jufeng Yang, Bo Sun, Kai Wang
IEEE International Conference on Multimedia and Expo (ICME), 2016, Oral .
Discovering Affective Regions in Deep Convolutional Neural Networks for Visual Sentiment Prediction
Ming Sun, Jufeng Yang, Kai Wang, Hui Shen
IEEE International Conference on Multimedia and Expo (ICME), 2016.