Biography
![]() |
I am currently a Senior Software Engineer of Natural Language Processing and Search Science at Baidu inc.. Prior to joining in Baidu, I received my Ph.D. degree from University of Chinese Academy of Sciences (UCAS) in 2022, under the supervision of Prof. Yanan Cao. Before that, I received my B.S. degree from China University of Petroleum (East China) in 2017. |
My research interests broadly include Artificial General Intelligence, Natural Language Processing, Semantic Indexing, and Information Extraction. Now, I am also committed to the frontier exploration and practical application of LLM SFT / Pretrain.
Education
![]() |
Sep. 2017 - Jul. 2022 Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China Ph.D. in Computer Science |
![]() |
Jan. 2017 - Jul. 2017 STIC/OSM Laboratory, École Nationale Supérieure de Techniques Avancées de Bretagne, Brest, France Study Abroad Funded by CSC |
![]() |
Jul. 2014 - Jul. 2017 Computer and Communication Engineering College, China University of Petroleum (East China), Qingdao, China B.S. in Computer Science and Technology |
![]() |
Sep. 2012 - Jul. 2014 School of Geosciences, China University of Petroleum (East China), Qingdao, China B.S. in Surveying Engineering |
Honors and Awards
- AIDU Talent Program. Baidu Inc.. 2022.
- Outstanding Graduate of Beijing. 2022.
- Outstanding Graduate of UCAS. University of Chinese Academy of Sciences. 2022.
- CAS Presidential Special Award. Chinese Academy of Sciences. 2022.
- CAS Presidential Excellent Scholarship. Chinese Academy of Sciences. 2022.
- National Scholarship for Doctoral Students. Ministry of Education of P.R. China. 2021.
- IIE Presidential Special Award. Institute of Information Engineering, CAS. 2020.
- IIE Presidential Excellent Scholarship. Institute of Information Engineering, CAS. 2020.
- Merit Student. University of Chinese Academy of Sciences. 2018.
- National Encouragement Scholarship, China University of Petroleum. 2016.
- National Scholarship, China University of Petroleum. 2015.
Publications
There are 12 papers, with 7 first-author papers.
-
Ruipeng Jia, Xingxing Zhang, Yanan Cao, Shi Wang, Zheng Lin and Furu Wei. Neural Label Search for Zero-Shot Multi-Lingual Extractive Summarization. The 60th Annual Meeting of the Association for Computational Linguistics (ACL'2022, Long Paper, Main Conference). Pages 561-570. Dublin, Ireland. May, 2022. [PDF], [Code].
-
Hanwei Wang, Piji Li, Yanan Cao, Ruipeng Jia, Wang Hai-long and Li Yang-chun. Decoupled Extractive Summarization as Graph Nodes Pruning. The 2022 International Joint Conference on Neural Networks (IJCNN'2022). Padua, Italy. July, 2022.
-
Ruipeng Jia, Yanan Cao, Fang Fang, Yuchen Zhou, Zheng Fang, Yanbing Liu and Shi Wang. Deep Differential Amplifier for Extractive Summarization. The 59th Annual Meeting of the Association for Computational Linguistics (ACL'2021, Long Paper, Main Conference). Pages 366-376. Bangkok, Thailand. August, 2021. [PDF], [Code].
-
Ruipeng Jia, Yanan Cao, Haichao Shi, Fang Fang, Pengfei Yin, and Shi Wang. Flexible Non-Autoregressive Extractive Summarization with Threshold: How to Extract a Non-Fixed Number of Summary Sentences. The Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI'2021). Pages 13134-13142. Vancouver, Canada. February, 2021. [PDF], [Code].
-
Zheng Fang, Yanan Cao, Tai Li, Ruipeng Jia, Fang Fang, Yanmin Shang and Yuhai Lu. TEBNER: Domain Specific Named Entity Recognition with Type Expanded Boundary-aware Network. The 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP'2021, Long Paper, Main Conference). Pages 198–207. Punta Cana, Dominican Republic. November, 2021.
-
Hengzhu Tang, Yanan Cao, Zhenyu Zhang, Ruipeng Jia, Fang Fang, and Shi Wang. Multi-Granularity Heterogeneous Graph for Document-Level Relation Extraction. 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP'2021). Pages 7683-7687. Toronto, Canada. June, 2021.
-
Ruipeng Jia, Yanan Cao, Hengzhu Tang, Fang Fang, Cong Cao and Shi Wang. Neural Extractive Summarization with Hierarchical Attentive Heterogeneous Graph Network. The 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP'2020, Long Paper, Main Conference). Pages 3622–3631. Barcelona, Spain. November, 2020. [PDF], [Code].
-
Ruipeng Jia, Yanan Cao, Haichao Shi, Fang Fang, Yanbing Liu and Jianlong Tan. DistilSum: Distilling the Knowledge for Extractive Summarization. 29th ACM International Conference on Information and Knowledge Management (CIKM'2020, Short Paper). Pages 2069–2072. Galway, Ireland. October, 2020. [PDF], [Code].
-
Ruipeng Jia, Yanan Cao, Fang Fang, Jinpeng Li, Yanbing Liu and Pengfei Yin. Enhancing Pre-trained Language Representation for Multi-Task Learning of Scientific Summarization. The 2020 International Joint Conference on Neural Networks (IJCNN'2020). Pages 1-8. Glasgow, UK. July, 2020.
-
Jinpeng Li, Chuang Zhang, Xiaojun Chen, Yanan Cao and Ruipeng Jia. Improving Abstractive Summarization with Modeling Iterative Representation. The 2020 International Joint Conference on Neural Networks (IJCNN'2020). Pages 1-8. Glasgow, UK. July, 2020.
-
Ruipeng Jia, Yanan Cao, Fang Fang, Jinpeng Li, Yanbing Liu and Pengfei Yin. Enhancing Textual Representation for Abstractive Summarization: Leveraging Masked Decoder. The 2020 International Joint Conference on Neural Networks (IJCNN'2020). Pages 1-8. Glasgow, UK. July, 2020.
-
Hao Xu, Yanan Cao, Ruipeng Jia, Yanbing Liu and Jianlong Tan. Sequence Generative Adversarial Network for Long Text Summarization. IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI'2018). Pages 242-248, Volos, Greece. November, 2018.
Working Experience
![]() |
July. 2022 - Now Senior Software Engineer of NLP at Baidu inc. Beijing, China |
![]() |
July. 2022 - June. 2023 Senior Software Engineer of Search Science at Baidu inc. Beijing, China |
![]() |
Mar. 2021 - Sep. 2021 Research Intern in the Natural Language Computing at Microsoft Research Beijing, China |
![]() |
Mar. 2018 - Jun. 2018 Research Intern in the Data Science Lab at JD.com Beijing, China |
Skills
A hundred ways to waste life
Language
![]() |
![]() |
![]() |
||
Python | Golang | C++ | ||
![]() |
![]() |
![]() |
![]() |
![]() |
NodeJS | HTML | CSS | JavaScript | TypeScript |
![]() |
![]() |
|||
Lua | Lisp |
Machine Learning
![]() |
![]() |
Pytorch | PaddlePaddle |
Web
![]() |
![]() |
![]() |
Vue | Django | Hugo |
IDE
![]() |
![]() |
![]() |
![]() |
![]() |
Spacemacs | NvChad | LunarVim | VSCode | PyCharm |