Ming Sun (孙明)


Kuaishou researcher, From 2021.3 to now
Sensetime researcher, From 2018.10 to 2021.3
BaiDu IDL intern and researcher, From 2016.11 to 2018.10
Bytedance (TouTiao) AI Lab intern, From 2016.5 to 2016.10
Email: m_sunming@163.com

Short Bio

From 2021, I am leading a outstanding group, about 30 researcher, which evaluate and promote the quality of user generated video on kuaishou. I worked at SenseTime with Junjie Yan from 2018.11 to 2021.3, focus on object detection and Automl. I worked at BaiDu IDL with Feng Zhou from 2016.11 to 2018.11 and learned a lot apart from tech. Fortunately, I was supervised by prefessor JuFeng Yang and cooperated with MingMing Cheng. And received the Master degree from Nankai University in 2017.

I was tech leader of the group, named 搜索与决策(inspired by exploration and exploitation), which focus on Detection(face/human/traffic/structure/keypoint/video) and Automl (augmentation/samping/loss/network auto search), about 20 researcher.

I am leading a group, which focus on UGC video quality assessment (KVQ, Kuaishou Video Quality, such as noise/blocky/color/dirtylens ) and processing algorithms (KEP & KRP, Kuaishou Enhancement & Restoration Processing, such as video deblur/SR/denoise ), about 20 researcher. For further boosting the performance of KVQ and KEP & KRP, we develop detection/classification algorithms.

AI+音视频:

  • 国际领先且广泛使用的VQA算法: 花费了四年时间积累了海量的高质量标注数据&基于多模态大模型框架,研发了KVQ(Kuaishou Video Quality)算法。同时搜集了大量的用户反馈场景,在快手上百个场景中均超过了Golden eye表现,目前每天模型调用上亿次,出售给多家互联网大厂,获得公司洛子峰-技术突破奖。在学术上首次提出QPT等pretrain框架,研发了多个系列包括QPT-V1(解决数据问题),V2(从清晰度分到观感分),V3(接入LLM做智能白盒化),已被CVPR等接受,并举办了多次CVPR workshop比赛(腾讯音视频等互联网大厂均参赛),在学术和工业界有较大影响,多次去NV GTC分享相关技术。
  • 显著改善画质的大模型生成算法: 处理算法可以简单分为三个时代(传统的算法/基于GAN的DL算法/基于生成的大模型算法)。在快手前三年间研发了系列传统算法和DL算法,包括去噪/去模糊/HDR/编码友好的前处理等,取得了显著的带宽和清晰度收益。最近随着生成式技术发展,处理领域还没突破该范式红利(主要集中在文字/人脸保真问题以及模型速度问题),快手是首个提出并研发出LPM(Large Preprocessing Model)的模型公司,解决了该领域生成和保真的平衡问题,并大量上线且取得了显著的AB收益。感谢快手海量高质量数据(数十亿),以及模型上系统上的彻底自研(VAE/DIT框架/时序建模/Reward model设计等针对性设计和训练)。除此之外,我们还做出了业界首个基于AR的处理模型,并验证主观指标超过diffusion,已被ICML接收。
  • 端到端极低码率且实用的转码系统:成本一直是该领域的核心关注点,随着编码器的成熟(265/266),想进一步降低码率且保住更多画面生成细节是一个非常重要的挑战。基于此,我们除了花了大量精力攻克编码器外,还实现了大模型+编码器+端上NPU处理整体联合并面向KVQ主观指标的优化pipeline,彻底做到训练端到端,部署可分离的状态。相比原始的很多端云结合等策略,有着明显的技术优势,尤其感谢端上手机厂商的NPU算力发展,功耗和算力都有着极高性价比。相关技术原型也发表在AAAI等顶级会议上,为公司节省大量成本。
  • AI Infra创新和软硬联合优化: 音视频领域高度数字化,是AI落地的重要场景,其也意味着AI算法需要深度优化才能过实现算法规模效应。以LPM为列,需要加速100倍以上,才能覆盖足够多的视频,从而拉动用户的QOE和GMV等表现。快手音视频AI infra主要集中在以下关注点,(1)和NV深度合作,优化质量评价算法和生成大模型,包括但不限于DIT attention量化/剪枝/OP图优化/任务间显存共享/LLM吞吐加速等; (2)和手机厂一起优化NPU算力和功耗,包括新的算子支持/IO传输等优化; (3)软硬协同优化,快手音视频有着自己的自研芯片,如何最大化芯片算力+特殊模型设计等都有着深入的研究。
  • News:

    Publication

    Visual Autoregressive Modeling for Image Super-Resolution.
    Yunpeng Qu, Kun Yuan, Jinhua Hao, Kai Zhao, Qizhi Xie, Ming Sun , Chao Zhou.
    ICML 2025. The first SR paper based on AR modeling.

    Accelerating Diffusion-based Super-Resolution with Dynamic Time-Spatial Sampling.
    Rui Qin, Qijie Wang, Ming Sun , Haowei Zhu, Chao Zhou, Bin Wang.
    IJCAI 2025. Traditional static time contidion used in diffusion is not enough

    Rethinking Diffusion Posterior Sampling: From Conditional Score Estimator to Maximizing a Posterior.
    Tongda Xu, Xiyan Cai, Xinjie Zhang, Xingtong Ge, Dailan He, Ming Sun , Jingjing Liu, Ya-Qin Zhang, Jian Li, Yan Wang.
    ICLR 2025.

    Boosting Video Quality Assessment via Saliency-guided Local Perception.
    Yunpeng Qu, Kun Yuan, Qizhi Xie, Ming Sun , Chao Zhou, Jian Wang.
    CVPR 2025.

    Plug-and-Play Tri-Branch Invertible Block for Image Rescaling.
    Jingwei Bao, Jinhua Hao, Pengcheng Xu, Ming Sun , Chao Zhou, Shuyuan Zhu.
    AAAI 2025. How to end2end promote the quality of mobile video

    QPT V2: Masked Image Modeling Advances Visual Scoring.
    Qizhi Xie, Kun Yuan, Yunpeng Qu, Mingda Wu, Ming Sun , Chao Zhou, Jihong Zhu.
    ACM MM 2024.

    A New Dataset and Framework for Real-World Blurred Images Super-Resolution.
    Rui Qin, Ming Sun , Chao Zhou, Bin Wang.
    ECCV , 2024. Dataset and Deblur Code!

    XPSR: Cross-modal Priors for Diffusion-based Image Super-Resolution.
    Yunpeng Qu, Kun Yuan, Kai Zhao, Qizhi Xie, Jinhua Hao, Ming Sun , Chao Zhou.
    ECCV , 2024.

    KVQ: Kwai Video Quality Assessment for Short-form Videos.
    Yiting Lu, Xin Li, Yajing Pei, Kun Yuan, Qizhi Xie, Yunpeng Qu, Ming Sun , Chao Zhou, Zhibo Chen.
    CVPR , 2024, the proposed dataset are available on the Nitre 2024 competition.

    CPGA: Coding Priors-Guided Aggregation Network for Compressed Video Quality Enhancement.
    Qiang Zhu, Jinhua Hao, Yukang Ding, Yu Liu, Qiao Mo, Ming Sun , Chao Zhou, Shuyuan Zhu.
    CVPR , 2024. Taking the codec into video restoration.

    PTM-VQA: Efficient Video Quality Assessment Leveraging Diverse PreTrained Models from the Wild.
    Kun Yuan, Hongbo Liu, Mading Li, Muyi Sun, Ming Sun , Jiachao Gong, Jinhua Hao, Chao Zhou, Yansong Tang.
    CVPR , 2024. Inspired by VMAF which are popular.

    Blind Image Super-resolution with Rich Texture-Aware Codebooks.
    Rui Qin, Ming Sun , Fangyuan Zhang, Xing Wen, Bin Wang.
    ACM MM , 2023, Oral . LR are not enough for BSR task.

    Ada-DQA: Adaptive Diverse Quality-aware Feature Acquisition for Video Quality Assessment.
    Hongbo Liu, Mingda Wu, Kun Yuan, Ming Sun , Yansong Tang, Chuanchuan Zheng, Xing Wen, Xiu Li.
    ACM MM , 2023. Domain knowledge for VQA

    Capturing Co-existing Distortions in User-Generated Content for No-reference Video Quality Assessment.
    Kun Yuan, Zishang Kong, Chuanchuan Zheng, Ming Sun , Xing Wen.
    ACM MM, 2023 , Oral . Sparse attention for VQA

    Reconstructed Convolution Module Based Look-Up Tables for Efficient Image Super-Resolution.
    Guandu Liu, Yukang Ding, Mading Li, Ming Sun , Xing Wen, Bin Wang.
    ICCV, 2023.

    Quality-aware Pre-trained Models for Blind Image Quality Assessment.
    Kai Zhao, Kun Yuan, Ming Sun , Mading Li, Xing Wen.
    CVPR, 2023. QPT with more data.

    AutoSampling: Search for Effective Data Sampling Schedules.
    Ming Sun , Haoxuan Dou, Baopu Li, Junjie Yan, Wanli Ouyang, Lei Cui.
    ICML, 2021. Auto sampling firstly.

    Evolving Search Space for Neural Architecture Search.
    Yuanzheng Ci, Chen Lin, Ming Sun , Boyu Chen, Hongwen Zhang, Wanli Ouyang , Junjie Yan.
    ICCV, 2021.

    GLiT: Neural Architecture Search for Global and Local Image Transformer.
    Boyu Chen, Peixia Li, Chuming Li, Baopu Li, Lei Bai, Chen Lin, Ming Sun , Junjie yan, Wanli Ouyang.
    ICCV, 2021.

    Inception Convolution with Efficient Dilation Search.
    Jie Liu, Chuming Li, Feng Liang, Chen Lin, Ming Sun , Junjie Yan, Wanli Ouyang, Dong Xu.
    CVPR, 2021, Oral . Simple conv for scale variance
    IC-conv Code!

    Improving Auto-Augment via Augmentation-Wise Weight Sharing.
    Keyu Tian, Chen Lin, Ming Sun , Luping Zhou, Junjie Yan, Wanli Ouyang
    NeurIPS, 2020. Best autoaug policy on Imagenet dataset
    Autoaug Code!

    Efficient Transfer Learning via Joint Adaptation of Network Architecture and Weight.
    Ming Sun , Haoxuan Dou, Junjie Yan
    ECCV, 2020. NAS for transfer learning first time!

    Powering One-shot Topological NAS with Stabilized Share-parameter Proxy.
    Ronghao Guo, Chen Lin, Chu ming Li, Keyu Tian, Ming Sun , Lu Sheng, Junjie Yan
    ECCV, 2020. NAS for Topo

    Large-Scale Object Detection in the Wild from Imbalanced Multi-Labels.
    Junran Peng, Xingyuan Bu, Ming Sun , Junjie Yan
    CVPR, 2020, Oral . Simple trick for OD

    Improving One-shot NAS by Suppressing the Posterior Fading.
    Xiang Li, Chen Lin, Chuming Li, Ming Sun , Wei Wu, Junjie Yan, Wanli Ouyang
    CVPR, 2020.

    Computation Reallocation for Object Detection.
    Feng Liang, Ronghao Guo, Chen Lin, Ming Sun, Wei Wu, Junjie Yan, Wanli Ouyang
    ICLR, 2020.

    Efficient Neural Architecture Transformation Searchin Channel-Level for Object Detection.
    Junran Peng, Ming Sun , Zhaoxiang Zhang, Tieniu Tan, Junjie Yan
    NIPS, 2019.

    POD: Practical Object Detection with Scale-Sensitive Network.
    Junran Peng, Ming Sun , Zhaoxiang Zhang, Junjie Yan, Tieniu Tan
    ICCV, 2019.

    Learning discriminative sentiment representation from strongly- and weakly-supervised CNNs
    Dongyu She, Ming Sun , Jufeng Yang
    TOMM, 2019.

    Multi-Attention Multi-Class Constraint for Fine-grained Image Recognition
    Ming Sun, Yuchen Yuan, Feng Zhou, Errui Ding
    ECCV, 2018, Oral .

    Compact Generalized Non-local Network
    Kaiyu Yue, Ming Sun, Yuchen Yuan, Errui Ding,Fuxin Xu, Feng Zhou
    NIPS, 2018.
    CGNL Code!

    Visual Sentiment Prediction based on Automatic Discovery of Affective Regions
    Jufeng Yang, Dongyu She, Ming Sun, Ming-Ming Cheng, Paul L. Rosin, Liang Wang
    IEEE Transactions on Multimedia (TMM), 2018.

    Learning Visual Sentiment Distributions via Augmented Conditional Probability Neural Network
    Jufeng Yang, Ming Sun, Xiaoxiao Sun
    AAAI Conference on Artificial Intelligence (AAAI), 2017.
    Dataset is available!

    Joint Image Emotion Classification and Distribution Learning via Deep Convolutional Neural Network
    Jufeng Yang, Dongyu She, Ming Sun
    International Joint Conference on Artificial Intelligence (IJCAI), Oral ,2017.

    A Benchmark for Automatic Visual Classification of Clinical Skin Disease Images
    Xiaoxiao Sun, Jufeng Yang, Ming Sun, Kai Wang
    European Conference on Computer Vision (ECCV), 2016.
    Project Homepage

    Shape-Guided Segmentation for Fine-Grained Visual Categorization
    Ming Sun, Jufeng Yang, Bo Sun, Kai Wang
    IEEE International Conference on Multimedia and Expo (ICME), 2016, Oral .

    Discovering Affective Regions in Deep Convolutional Neural Networks for Visual Sentiment Prediction
    Ming Sun, Jufeng Yang, Kai Wang, Hui Shen
    IEEE International Conference on Multimedia and Expo (ICME), 2016.