About us



Overview of the Research Group


SciAIEngine team in the National Science Library, Chinese Academy of Sciences, was established in 2017. Our team has undertaken several projects related to deep learning, including intelligent capacity building project of Chinese Academy of Sciences "Construction of an AI Engine based on Scientific Literature Knowledge", National Science and Technology Library (NSTL) "Next-generation Open Knowledge Service Platform Overall Design and Key Technology Research and Development Project : Development of Technology Tools for Annotating Move in Scientific Paper Based on Deep Learning", National Science Library, Chinese Academy of Sciences "Demonstration of Enriched Semantic Retrieval Application in Scientific Literature", Major Program of the National Social Science Fund of China (National Level) "Big data-driven semantic evaluation system for scientific literature" (Grant No.21&ZD329), The National Key R&D Program of China (National Level) "Key technologies and software for deep mining and intelligent analysis of scientific literature content" (Grant No.2022YFF0711900) and so on. Scientific literature resources contain abundant knowledge content, such as definitions, concepts, research backgrounds, research questions, research foundations, research approaches, theoretical tools and methods applied in papers, scientific experiments conducted, experimental results obtained, and research conclusions drawn. Unveiling the knowledge within scientific literature is an important task for researchers and developers in the field of digital libraries.



Research topic




Scientific research updates


December 4, 2020: SciAIEngine was released.

August 28, 2022: Our Team participated in the "2022 Annual Intelligence Conference in China & Intelligence and Intelligence Work Development Forum and the 12th National Intelligence Doctoral Student Academic Forum" and won a number of awards:

Dr. Huan Liu's dissertation "Research on Construction of Knowledge-infused Pre-trained Language Model for Abstract of Scientific Papers" won the "2022 National Excellent Doctoral Dissertation Award in Intelligence", and his mentor, Zhixiong Zhang, was honored with the Outstanding Mentor Award.

The paper "Research on SCCL Text Deep Clustering Methods with Increased Class Cluster-Level Comparison" authored by PhD Candidate Jie Li et al. won the Outstanding Paper Award at the 2022 Annual Intelligence Conference;

The paper "Research on Question Sentence Recognition of Scientific Literature" authored by PhD Candidate Xuesi Li et al. won the Second Prize of National Intelligence Doctoral Student Academic Forum;

The paper "Design and Implementation of Automatic Title Generation System for Chinese Scientific Literature" authored by PhD Candidate Yufei Wang et al. won the third prize of National Intelligence Doctoral Student Academic Forum.

December 19, 2023: TianShao Postdoctoral Fellow, a member of the research group, has been selected as a member of the 2023 National Postdoctoral Research Program, Class C.



Research Team


Team Leader


Zhixiong Zhang

Deep Learning, Semantic Annotation,

Information Extractionl, Scientific Information Monitoring,

Preprint Scholarly Communication.

Deputy Director of National Science Library, Chinese Academy of Sciences, Research Librarian, Doctoral Supervisor. He is a recipient of the Chinese Academy of Sciences Distinguished Research Fellowship Program and the winner of the Chinese Academy of the Zhu Liyuehua Outstanding Teacher Award. He is also the Deputy Director of the Chinese Committee on Scientific and Technical Information as well as the Digital Library Research and Construction Professional Committee of the China Society of Library Science; Co-Editor-in-Chief of the journal "Data Intelligence"; Deputy Editor-in-Chief of the journal "Data Analysis and Knowledge Discovery"; Editorial Board Member of the journals "Journal of Data and Information Science (JDIS) ", "Digital Library Forum", "Think Tank Theory and Practice", "Intelligence Engineering". He has published one monograph, authored over 180 research papers, translated three works, and presided and participated in over forty national and provincial-level projects. The national-level key projects that he has presided or participated as a core member: The National Key R&D Program of China (National Level) "Key technologies and software for deep mining and intelligent analysis of scientific literature content" (Grant No.2022YFF0711900), Major Program of the National Social Science Fund of China (National Level) "Big data-driven semantic evaluation system for scientific literature" (Grant No.21&ZD329), National Social Science Foundation project "Theoretical and Practical Research on Preprint Scholarly Communication" (Grant No.19BTQ006), National Natural Science Foundation "Research on Text Topic Centrality Calculation Method Based on Language Network" (Grant No.61075047), Key Project of the 12th Five-Year National Science and Technology Support Program "Construction of Technology Knowledge Organization System Shared Service Platform" (Grant No.2011BAH10B03), National Social Science Foundation project "Theoretical and Methodological Research on Knowledge Extraction from Digital Information Resources" (Grant No.05BTQ006), National Social Science Foundation project "Research and Practice on Long-term Preservation Technology of Digital Resources" (Grant No.09FTQ005), Key Project of the 11th Five-Year National Science and Technology Support Program "Research and Application of Science and Technology Evaluation Methods and Techniques Based on Massive Information Analysis" (Grant No.2006BAH03B05), National Social Science Foundation project "Research on the Theory and Methodology of Internet Information Resource Preservation" (Grant No.06BTQ025), National Social Science Foundation project "Research on Monitoring and Analyzing Methods of Bursting Topics in Internet Science and Technology Information" (Grant No.09BTQ035), and et al.

Team Members


Gaihong Yu(Research Librarian)

Automatic Annotation of Functional Discourse Elements in Research Papers, Scientific Information Monitoring


Min Zhang(Research Librarian)

Intelligent Semantic Index Construction, Subject Index, Scientific Information Monitoring


Yi Liu(Postdoc)

Q & A of Scientific Literature, Automatic Symthesis, Text Clustering


Meng Wang(Postdoc)

Value Extraction of Scientific Questions


Huan Liu(Doctor)

Scientific Literature Language Model Pre-training, SciAIEngine Construction


Liangping Ding(Doctor)

Keywords Extraction, Named Entity Recognition


Jie Li(PhD Candidate)

Reviewer Recommendation, Scientific Literature Corpus Construction, Text Clustering


Yang Zhao(PhD Candidate)

Automatic Classification, Move Recognition for Funds, Client Development


Xuesi Li(PhD Candidate)

Definition Recognition for Scientific Literature, Event Extraction


Yufei Wang(PhD Candidate)

Automatic Text Label Generation, Keywords Sorting


Mengting Zhang(PhD Candidate)

Title Generation for Scientific Text


Xin Lin(PhD Candidate)

Citation Sentence Recognition for Scientific Literature


Yang Li(PhD Candidate)

Knowledge Object Extraction


Yajiao Wang

Cluster Label Generation



Published Papers


1. Zhixiong Zhang, Huan Liu, Liangping Ding, Pengmin Wu, Gaihong Yu. Moves Recognition in Abstract of Research Paper Based on Deep Learning[C]. 2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL). 2019.06

2. 丁良萍, 张智雄, 刘欢. 影响支持向量机模型语步自动识别效果的因素研究[J]. 数据分析与知识发现. 2019.12

3. Gaihong Yu, Zhixiong Zhang, Huan Liu, Liangping Ding. Masked Sentence Model based on BERT for Move Recognition in Medical Scientific Abstracts[J]. Jornal of Data and Information Science (JDIS). 2019.12

4. 马娜, 张智雄, 于改红. 科技论文引用对象研究综述[J]. 图书情报工作. 2019.12

5. 张智雄, 刘欢, 丁良萍, 吴朋民, 于改红. 不同深度学习模型的科技论文摘要语步识别效果对比研究[J]. 数据分析与知识发现. 2020.01

6. Liangping Ding, Zhixiong Zhang, Huan Liu, Jie Li, Gaihong Yu. Automatic Keyphrase Extraction from Scientific Chinese Medical Abstracts Based on Character-Level Sequence Labeling[C]. 2020 EEKE Workshop of ACM/IEEE Joint Conference on Digital Libraries. 2020.08

7. 赵旸, 张智雄, 刘欢, 丁良萍. 基于BERT模型的中文医学文献分类研究[J]. 数据分析与知识发现. 2020.09

8. 张智雄, 刘欢, 于改红. 构建基于科技文献知识的人工智能引擎[J]. 农业图书情报学报. 2021.01

9. 刘欢, 张智雄, 王宇飞. BERT模型的主要优化改进方法研究综述[J]. 数据分析与知识发现. 2021.01

10. 张敏, 丁良萍, 刘欢. 面向科技文献的多维语义索引构建思路及实现[J]. 情报理论与实践. 2021.4

11. Liangping Ding, Zhixiong Zhang, Huan Liu, Yang Zhao. Design and Implementation of Keyphrase Extraction Engine for Chinese Scientific Literature[C]. 2021 EEKE Workshop of ACM/IEEE Joint Conference on Digital Libraries. 2021.09

12. Liangping Ding, Zhixiong Zhang, Yang Zhao. Bert-Based Chinese Medical Keyphrase Extraction Model Enhanced with External Features[C]. The 23rd International Conference on Asia-Pacific Digital Libraries. 2021.09

13. 丁良萍, 张智雄, 刘欢. 利用本体范畴体系实现物理学文献中的领域命名实体识别[C]. 2021中国情报学会年会&全国情报学博士生学术论坛(三等奖). 2021.09

14. 张敏, 刘欢, 丁良萍, 范青. 基于深度学习的网络科技信息情报价值计算方法研究[J]. 图书情报工作. 2021.10

15. 赵旸, 张智雄, 刘欢. 基于层次分类法的中文医学文献分类研究[J]. 图书馆学研究. 2021.11

16. 丁良萍, 张智雄, 张敏, 刘欢. 一个语义检索系统用户交互界面的设计与实现[C]. 2021全国图书馆学博士生论坛. 2021.11

17. 赵旸, 张智雄, 刘欢, 李婕. 基金项目摘要的语步识别系统设计与实现[J]. 情报理论与实践. 2022.04

18. 李雪思, 张智雄, 刘欢. 基于BERT模型实现概念定义句自动识别[J]. 情报科学. 2022.05

19. 李雪思, 张智雄, 刘欢. 一种基于序列标注的概念短语抽取方法[J]. 图书情报工作. 2022.06

20. 张智雄, 赵旸, 刘欢. 构建面向实际应用的科技文献自动分类引擎[J]. 中国图书馆学报. 2022.06

21. Liangping Ding, Zhixiong Zhang, Huan Liu. A Bootstrapped Chinese Biomedical Named Entity Recognition Model Incorporating Lexicons[C]. 2022 EEKE Workshop of ACM/IEEE Joint Conference on Digital Libraries. 2022.09

22. 赵旸, 张智雄. 当前国际预印本平台主要创新功能研究[J]. 中国科技期刊研究. 2022.10

23. 李雪思, 张智雄. 预印本学术交流生态中的参与主体及作用[J]. 中国科技期刊研究. 2022.10

24. 王宇飞, 张智雄, 赵旸, 张梦婷, 李雪思. 中文科技论文标题自动生成系统的设计与实现[J]. 数据分析与知识发现. 2022.11

25. 赵旸, 张智雄, 李婕. 项目申请书摘要文本的语步识别语料构建[J]. 图书情报工作. 2022.11

26. 李婕, 张智雄. 考虑局部特征和全局几何结构的文本深度聚类方法研究[C]. 吉林大学第十五届博士生国际学术论坛(三等奖). 2022.11

27. Liangping Ding, Tianyuan Huang, Huan Liu, Yufei Wang, Zhixiong Zhang. Distantly Supervised Named Entity Recognition with Category-Oriented Confidence Calibration[C]. International Conference on Asian Digital Libraries. 2022.12

28. 钱力, 刘熠, 张智雄, 李雪思, 谢靖, 许钦亚, 黎洋, 管铮懿, 李西雨, 文森. ChatGPT的技术基础分析[J]. 数据分析与知识发现. 2023.03

29. 张智雄, 于改红, 刘熠, 林歆, 张梦婷, 钱力. ChatGPT对文献情报工作的影响[J]. 数据分析与知识发现. 2023.04

30. 李婕, 张智雄, 王宇飞. 增加类簇级对比的SCCL文本深度聚类方法研究[J]. 数据分析与知识发现. 2023.04

31. Jie Li, Gaihong Yu, Zhixiong Zhang. RCMR 280k: Refined Corpus for Move Recognition Based on PubMed Abstracts[J]. Data Intelligence. 2023.04

32. 刘熠, 张智雄, 王宇飞, 李雪思. 基于语步识别的科技文献结构化自动综合工具构建[J]. 数据分析与知识发现. 2023.05

33. Liangping Ding, Giovanni Colavizza, Zhixiong Zhang. Partial Annotation Learning for Biomedical Entity Recognition[C]. International Society for Scientometrics and Informatrics. 2023.05

34. 李雪思, 张智雄, 刘熠, 王宇飞. 科技文献研究问题句识别方法研究[J]. 图书情报工作. 2023.05

35. Xuesi Li, Liangping Ding, Zhixiong Zhang. Drug Target Extraction from Biomedical Articles Based on a Two-Stage Cascading Framework[C]. 2023 ACM/IEEE Joint Conference on Digital Libraries (JCDL). 2023.06

36. 张智雄. 人工智能发展需要关注“复利效应”[J]. 竞争情报. 2023.06

37. 张智雄, 张梦婷, 林歆, 赵昆华, 李苑. 开放科学环境下全球科技期刊的发展态势[J]. 中国科学院院刊. 2023.06

38. Yang Zhao, Zhixiong Zhang, Yufei Wang, Xin Lin. Identifying research contributions based on semantic analysis of citation sentences: A case study of the 2021 Physiology or Medicine Nobel Prize laureates[C]. International Society for Scientometrics and Informatrics (ISSI 2023). 2023.07

39. Yang Zhao, Xin Lin, Yufei Wang, Mengting Zhang, Zhixiong Zhang. RefSciRate: A Reference Rating Method for Single Scientific Papers[C]. International Society for Scientometrics and Informatrics (ISSI 2023). 2023.07

40. Yang Zhao, Zhixiong Zhang, Yue Xiao. Leveraging MRC Framework for Research Contribution Patterns Identification in Citation Sentences[C]. The 25th International Conference on Asia-Pacific Digital Libraries (ICADL 2023). 2023.11

41. 李雪思, 张智雄, 王宇飞, 刘熠. 领域知识演化分析方法综述[J]. 数据分析与知识发现. 2023.12 (待发表)

42. 王宇飞, 张智雄, 李雪思, 刘熠. 基于IAR的短语级聚类标签自动抽取算法研究[J]. 数据分析与知识发现. 2023.12 (待发表)

43. Yang Zhao, Mengting Zhang, Xiaoli Chen, Liangping Ding, Zhixiong Zhang. Early identification of scientific breakthroughs based on outlier analysis of research entities[J]. Jornal of Data and Information Science (JDIS). 2023.12 (待发表)



Software Copyrights


Our team has applied for 30 relevant software copyrights:

Responsive image Responsive image Responsive image Responsive image