About me

My name is Yu-Sheng (Ethan) Su. I am a research scientist at AMD GenAI team (is hiring 🔥) and work on large-scale foundation models, especially focusing on data, model architecture, and training efficiency optimization. Before I joined AMD, I was a Postdoctoral Researcher hosted by Eric Xing from CMU / MBZUAI. I completed my Ph.D. in Department of Computer Science and Technology at Tsinghua University. Throughout my Ph.D. (from 2019 to 2023), I had the privilege of being advised by Zhiyuan Liu and being a part of the THUNLP Lab hosted by Maosong Sun. Besides, I worked closely with some LLM start-up teams including ModelBest and llm360.

Hiring

AMD’s GenAI team focuses on building a series of foundation models and is hiring 🔥 for multiple roles, including Principal Research Scientist, Applied Research Scientist, and Research Intern. [Click here] to know more and feel free to reach me out if you’re interested in.

Research

My primary work and research focus on advancing models toward achieving AGI. Thus, I concentrate on three key areas: (1) scaling data volume and quality, (2) enhancing the robustness of model architectures, and (3) optimizing training efficiency to push beyond the boundaries of current state-of-the-art model capabilities and facilitate rapid its iterative development [Google Scholar] [GitHub].

Talks

News

Publications

  • ChatDev Communicative agents for software development
    Chen Qian, Xin Cong, Cheng Yang, Weize Chen, Yusheng Su, Juyuan Xu, Zhiyuan Liu, Maosong Sun
    ACL 2024. [pdf] [code]

  • AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors in Agents
    Yusheng Su*, Weize Chen*, Jingwei Zuo, Cheng Yang, Chenfei Yuan, Chen Qian, Chi-Min Chan, Yujia Qin, Yaxi Lu, Ruobing Xie, Zhiyuan Liu, Maosong Sun, Jie Zhou ( * indicates equal contribution)
    ICLR 2024. [pdf] [code]

  • ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate
    Chi-Min Chan, Weize Chen, Yusheng Su, Jianxuan Yu, Wei Xue, Shanghang Zhang, Jie Fu, Zhiyuan Liu
    ICLR 2024. [pdf] [code]

  • Exploring the Impact of Model Scaling on Parameter-efficient Tuning Methods
    Yusheng Su, Chi-Min Chan, Jiali Cheng, Yujia Qin, Yankai Lin, Shengding Hu, Zonghan Yang, Ning Ding, Xingzhi Sun, Guotong Xie, Zhiyuan Liu, Maosong Sun
    EMNLP 2023. [pdf] [code]

  • Parameter-efficient Fine-tuning of Large-scale Pre-trained Language Models
    Ning Ding, Yujia Qin, Guang Yang, Fuchao Wei, Zonghan Yang, Yusheng Su, Shengding Hu, Yulin Chen, Chi-Min Chan, Weize Chen, Jing Yi, Weilin Zhao, Xiaozhi Wang, Zhiyuan Liu, Hai-Tao Zheng, Jianfei Chen, Yang Liu, Jie Tang, Juanzi Li, Maosong Sun.
    Nature Machine Intelligence 2023 (Cover Article). [pdf] [code]

  • On Transferability of Prompt Tuning for Natural Language Processing
    Yusheng Su, Xiaozhi Wang, Yujia Qin, Chi-Min Chan, Yankai Lin, Zhiyuan Liu, Peng Li, Juanzi Li, Lei Hou, Maosong Sun, Jie Zhou
    NAACL 2022 (Oral). [pdf] [code] [BibTex] [slide] [video]

  • Knowledge Inheritance for Pre-trained Language Models
    Yujia Qin, Yankai Lin, Jing Yi, Jiajie Zhang, Xu Han, Zhengyan Zhang, Yusheng Su, Zhiyuan Liu, Peng Li, Maosong Sun, Jie Zhou
    NAACL 2022 (Oral). [pdf] [code]

  • Exploring Low-dimensional Intrinsic Task Subspace via Prompt Tuning
    Yujia Qin, Xiaozhi Wang, Yusheng Su, Yankai Lin, Ning Ding, Zhiyuan Liu, Juanzi Li, Lei Hou, Peng Li, Maosong Sun, Jie Zhou
    ACL 2022 Findings. [pdf] [code]

  • CPM: A large-scale Generative Chinese Pre-trained Language Model
    Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang, Juanzi Li, Xiaoyan Zhu, Maosong Sun
    AI OPEN 2021. [pdf] [code]

  • CSS-LM: A Contrastive Framework for Semi-supervised Fine-tuning of Pre-trained Language Models
    Yusheng Su, Xu Han, Yankai Lin, Zhengyan Zhang, Zhiyuan Liu, Peng Li, Maosong Sun
    WWW 2021 Workshop, IEEE/TASLP 2021. [pdf] [code] [slide]

  • CokeBERT: Contextual Knowledge Selection and Embedding Towards Enhanced Pre-Trained Language Models
    Yusheng Su, Xu Han, Zhengyan Zhang, Peng Li, Zhiyuan Liu, Yankai Lin, Jie Zhou, Maosong Sun
    EMNLP 2020 Findings, AI OPEN 2021. [pdf] [pdf] [code]

Under Review or Preprint Version

  • Human Emotion Knowledge Representation Emerges in Large Language Models and Supports Discrete Emotion Inference
    Yusheng Su*, Ming Li*, Hsiu-Yuan Huang, Jiali Cheng, Xin Hu, Xinmiao Zhang, Huadong Wang, Yujia Qin, Xiaozhi Wang, Zhiyuan Liu, Dan Zhang ( * indicates equal contribution)
    (Submitted to Nature Human Behaviour 2023). [pdf] [code] (Refactoring - User friendly toolkit coming soon)

  • Tool Learning with Foundation Models
    Yujia Qin, Shengding Hu, Yankai Lin, Weize Chen, Ning Ding, Ganqu Cui, Zheni Zeng, Yufei Huang, Chaojun Xiao, Chi Han, Yi Ren Fung, Yusheng Su, Huadong Wang, Cheng Qian, Runchu Tian, Kunlun Zhu, Shihao Liang, Xingyu Shen, Bokai Xu, Zhen Zhang, Yining Ye, Bowen Li, Ziwei Tang, Jing Yi, Yuzhang Zhu, Zhenning Dai, Lan Yan, Xin Cong, Yaxi Lu, Weilin Zhao, Yuxiang Huang, Junxi Yan, Xu Han, Xian Sun, Dahai Li, Jason Phang, Cheng Yang, Tongshuang Wu, Heng Ji, Zhiyuan Liu, Maosong Sun
    ArXiv 2023. [pdf] [code]

Efficient Training Projects

  • (Leader/Co-leader) Prompt Transferability. This system assists users in building a prompt bank, allowing them to save well-trained prompts. It also enables swift access and reuse of these prompts whenever the user requires them on unseen tasks and heterogeneous models.

Readme Card

Agents Projects

  • (Leader/Co-leader) AgentVerse. AgentVerse provides a framework that streamlines the process of developing custom multi-agent systems using LLMs in user-defined environments. This facilitates the design of more efficient multi-agent systems that can be applied to real-world applications. [NVIDIA’s Official Bolg], [Youtube1], [Youtube2]

Readme Card

  • (Member) XAgent. XAgent makes more effective decisions and executes efficient actions to accomplish tasks with an unprecedented degree of autonomy. [Youtube1], [Youtube2]

Readme Card

Readme Card

  • (Member) Tool Learning. Tool learning for LLMs, open-source solutions of ChatGPT-Plugins.

Readme Card

LLM Pre-training Projects

  • (Member) CPM-X. The first chinese-version large-scale pre-trained project and released a series of LLMs in 2020-2021.

Readme Card

Readme Card

Experiences

Sailing Lab - CMU (U.S) & MBZUAI (U.A.E), 2023 - 2024

THUNLP Lab - Tsinghua University (China), 2019 - 2023

MediaTek (Taiwan), 2018 - 2019

  • Deep/Machine Learning Engineer Intern
  • Advised by Jing-Han Wang.

Microsoft, 2015 - 2016

Professional Services

Reviewer (Since 2021): ACL, NAACL, AACL, ACL Roling, EMNLP, COLING, ICLR, ICML, IJCAI, AAAI, NeurIPS

Pre-doctoral Student Mentoring

  • (Since 2021-2023) Chi-Min Chan: Tsinghua University (BS) -> Hong Kong University of Science and Technology (HKUST) (MS)
  • (Since 2022-2023) Jiali Cheng: University of North Carolina (MS->PhD)
  • (Since 2022-2023) Yu Xia: Peking University (MS) -> Tsinghua University (PhD)
  • (Since 2022-2023) Xiuyuan Huang: University of Science and Technology Beijing (BS) -> Peking University (MS)

Vistors