Residential College | false |
Status | 已發表Published |
Fine-Grained Distillation for Long Document Retrieval | |
Zhou, Yucheng1; Shen, Tao2; Geng, Xiubo3; Tao, Chongyang3; Shen, Jianbing1; Long, Guodong2; Xu, Can3; Jiang, Daxin3 | |
2024-03-24 | |
Conference Name | 38th AAAI Conference on Artificial Intelligence, AAAI 2024 |
Source Publication | Proceedings of the AAAI Conference on Artificial Intelligence |
Volume | 38 |
Issue | 17 |
Pages | 19732-19740 |
Conference Date | 20-27 February 2024 |
Conference Place | Vancouver, CANADA |
Country | Canada |
Abstract | Long document retrieval aims to fetch query-relevant documents from a large-scale collection, where knowledge distillation has become de facto to improve a retriever by mimicking a heterogeneous yet powerful cross-encoder. However, in contrast to passages or sentences, retrieval on long documents suffers from the scope hypothesis that a long document may cover multiple topics. This maximizes their structure heterogeneity and poses a granular-mismatch issue, leading to an inferior distillation efficacy. In this work, we propose a new learning framework, fine-grained distillation (FGD), for long-document retrievers. While preserving the conventional dense retrieval paradigm, it first produces global-consistent representations crossing different fine granularity and then applies multi-granular aligned distillation merely during training. In experiments, we evaluate our framework on two long-document retrieval benchmarks, which show state-of-the-art performance. |
Keyword | Nlp: Sentence-level Semantics Textual Inference, Etc. Nlp: Applications Nlp: Other |
DOI | 10.1609/aaai.v38i17.29947 |
URL | View the original |
Indexed By | CPCI-S |
Language | 英語English |
WOS Research Area | Computer Science ; Education & Educational Research |
WOS Subject | Computer Science, Artificial Intelligence ; Computer Science, Interdisciplinary Applications ; Computer Science, Theory & Methods ; Education, Scientific Disciplines |
WOS ID | WOS:001239407300137 |
Scopus ID | 2-s2.0-85189631204 |
Fulltext Access | |
Citation statistics | |
Document Type | Conference paper |
Collection | Faculty of Science and Technology THE STATE KEY LABORATORY OF INTERNET OF THINGS FOR SMART CITY (UNIVERSITY OF MACAU) DEPARTMENT OF COMPUTER AND INFORMATION SCIENCE |
Corresponding Author | Jiang, Daxin |
Affiliation | 1.SKL-IOTSC, CIS, University of Macau, Macao 2.AAII, FEIT, University of Technology Sydney, Australia 3.Microsoft Corporation, |
First Author Affilication | University of Macau |
Recommended Citation GB/T 7714 | Zhou, Yucheng,Shen, Tao,Geng, Xiubo,et al. Fine-Grained Distillation for Long Document Retrieval[C], 2024, 19732-19740. |
APA | Zhou, Yucheng., Shen, Tao., Geng, Xiubo., Tao, Chongyang., Shen, Jianbing., Long, Guodong., Xu, Can., & Jiang, Daxin (2024). Fine-Grained Distillation for Long Document Retrieval. Proceedings of the AAAI Conference on Artificial Intelligence, 38(17), 19732-19740. |
Files in This Item: | There are no files associated with this item. |
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.
Edit Comment