A Lebesgue Integral based Approximation for Language Modelling
基于勒贝格积分的语言建模近似
基本信息
- 批准号:EP/X019063/1
- 负责人:
- 金额:$ 25.77万
- 依托单位:
- 依托单位国家:英国
- 项目类别:Research Grant
- 财政年份:2023
- 资助国家:英国
- 起止时间:2023 至 无数据
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
Deep learning (DL) based Natural Language Processing (NLP) technologies have attracted significant interest in recent years. The current SOTA language models, a.k.a. transformer-based language models, typically assume that the representation of a given word can be captured by the interpolation of its related context in a convex hull. However, it has recently been shown that in high-dimensional spaces, the interpolation almost surely never occurs regardless of the underlying intrinsic dimension of the data manifold. The representations generated by such transformer-based language models will converge into a dense cone-like hyperspace which is often discontinuous with many nonadjacent clusters. To overcome the limitation of current methods in most DL-based NLP models, this project aims to deploy Lebesgue integral, which can be defined as an ensemble of integrals among partitions (i.e., discontinuous feature clusters), to approximate the posterior distributions of clusters given input word features in finite measurable sets by automatically identifying the boundary of such discontinuous set, which in turn could help to generate better interpretations and quantify the uncertainty. By our proposed Lebesgue integral based approximation, the input text will be characterised by two properties: an indicator vector encoding its membership in clusters (i.e., measurable sets), and another continuous feature representation for better capturing its semantic meaning for downstream tasks. This not only allows for a more faithful approximation of commonly observed countably discontinuities in distributions of input text in NLP, but also enables learning text representations that are better understood by humans.
近年来,基于深度学习(DL)的自然语言处理(NLP)技术引起了人们的极大兴趣。当前的SOTA语言模型,也称为基于转换器的语言模型,通常假设给定单词的表示可以通过在凸包中插入相关上下文来捕获。然而,最近的研究表明,在高维空间中,无论数据流形的内在维数是多少,插值几乎肯定不会发生。由这种基于变换的语言模型生成的表示将收敛成密集的锥状超空间,该空间通常是不连续的,有许多不相邻的簇。为了克服大多数基于dl的NLP模型中现有方法的局限性,本项目旨在部署Lebesgue积分,该积分可以定义为分区(即不连续特征聚类)之间的积分集合,通过自动识别这种不连续集的边界来近似有限可测量集中给定输入词特征的聚类的后测分布,从而有助于生成更好的解释和量化不确定性。通过我们提出的基于勒贝格积分的近似,输入文本将由两个属性来表征:一个指示向量编码其在聚类(即可测量集)中的隶属关系,另一个连续特征表示用于更好地捕获其下游任务的语义。这不仅可以更忠实地近似NLP中输入文本分布中通常观察到的可数不连续,而且还可以学习人类更好理解的文本表示。
项目成果
期刊论文数量(10)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Document-Level Multi-Event Extraction with Event Proxy Nodes and Hausdorff Distance Minimization
- DOI:10.48550/arxiv.2305.18926
- 发表时间:2023-05
- 期刊:
- 影响因子:0
- 作者:Xinyu Wang;Lin Gui;Yulan He
- 通讯作者:Xinyu Wang;Lin Gui;Yulan He
Distilling ChatGPT for Explainable Automated Student Answer Assessment
提炼 ChatGPT 以进行可解释的自动化学生答案评估
- DOI:10.18653/v1/2023.findings-emnlp.399
- 发表时间:2023
- 期刊:
- 影响因子:0
- 作者:Li J
- 通讯作者:Li J
CUE: An Uncertainty Interpretation Framework for Text Classifiers Built on Pre-Trained Language Models
- DOI:10.48550/arxiv.2306.03598
- 发表时间:2023-06
- 期刊:
- 影响因子:0
- 作者:Jiazheng Li;ZHAOYUE SUN;Bin Liang;Lin Gui;Yulan He
- 通讯作者:Jiazheng Li;ZHAOYUE SUN;Bin Liang;Lin Gui;Yulan He
Event Temporal Relation Extraction with Bayesian Translational Model
- DOI:10.48550/arxiv.2302.04985
- 发表时间:2023-02
- 期刊:
- 影响因子:0
- 作者:Xingwei Tan;Gabriele Pergola;Yulan He
- 通讯作者:Xingwei Tan;Gabriele Pergola;Yulan He
Uncertainty Quantification for Text Classification
- DOI:10.1145/3539618.3594243
- 发表时间:2023-07
- 期刊:
- 影响因子:0
- 作者:Dell Zhang;Murat Sensoy;M. Makrehchi;Bilyana Taneva-Popova;Lin Gui;Yulan He
- 通讯作者:Dell Zhang;Murat Sensoy;M. Makrehchi;Bilyana Taneva-Popova;Lin Gui;Yulan He
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Lin Gui其他文献
Ensemble with estimation: seeking for optimization in class noisy data
集成估计:寻求类噪声数据的优化
- DOI:
10.1007/s13042-019-00969-8 - 发表时间:
2019-06 - 期刊:
- 影响因子:5.6
- 作者:
Ruifeng Xu;Zhiyuan Wen;Lin Gui;Qin Lu;Binyang Li;Xizhao Wang - 通讯作者:
Xizhao Wang
A Low-Complexity Compressive Sensing Algorithm for PAPR Reduction
一种用于降低 PAPR 的低复杂度压缩感知算法
- DOI:
10.1007/s11277-014-1753-8 - 发表时间:
2014-04 - 期刊:
- 影响因子:2.2
- 作者:
Si Liu;Yun Rui;Lin Gui;YingguanWang - 通讯作者:
YingguanWang
Compressive Sensing-Based Detector Design for SM-OFDM Massive MIMO High Speed Train Systems
基于压缩感知的 SM-OFDM 大规模 MIMO 高速列车系统检测器设计
- DOI:
10.1109/tbc.2017.2731039 - 发表时间:
2017-08 - 期刊:
- 影响因子:4.5
- 作者:
Bo Gong;Lin Gui;Qibo Qin;Xiang Ren - 通讯作者:
Xiang Ren
A Cross-Lingual Approach for Opinion Holder Extraction
意见持有者提取的跨语言方法
- DOI:
- 发表时间:
2013 - 期刊:
- 影响因子:0
- 作者:
Lin Gui;Ruifeng Xu;Jun Xu;Chengxiang Liu - 通讯作者:
Chengxiang Liu
PLAYER*: Enhancing LLM-based Multi-Agent Communication and Interaction in Murder Mystery Games
PLAYER*:增强谋杀之谜游戏中基于 LLM 的多代理通信和交互
- DOI:
10.48550/arxiv.2404.17662 - 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
Qinglin Zhu;Runcong Zhao;Jinhua Du;Lin Gui;Yulan He - 通讯作者:
Yulan He
Lin Gui的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
相似国自然基金
用CLEAN和直接解调方法分析INTEGRAL数据
- 批准号:10603004
- 批准年份:2006
- 资助金额:35.0 万元
- 项目类别:青年科学基金项目
相似海外基金
I-Corps: A Novel integral-proportional and proportional-integral controller for inverter-based microgrids
I-Corps:一种用于基于逆变器的微电网的新型积分比例和比例积分控制器
- 批准号:
2402495 - 财政年份:2024
- 资助金额:
$ 25.77万 - 项目类别:
Standard Grant
Seeking an integral theory of blame and forgiveness: an inquiry to construct a relation-based theory of forgiveness
寻求责备与宽恕的整体理论:构建基于关系的宽恕理论的探究
- 批准号:
23K00027 - 财政年份:2023
- 资助金额:
$ 25.77万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Quantum simulators based on integral photonics
基于积分光子学的量子模拟器
- 批准号:
2609145 - 财政年份:2021
- 资助金额:
$ 25.77万 - 项目类别:
Studentship
HCC: Small: Neural Network-based Solvers for Integral Equations in Light Transport
HCC:小型:基于神经网络的光传输积分方程求解器
- 批准号:
2126407 - 财政年份:2021
- 资助金额:
$ 25.77万 - 项目类别:
Standard Grant
Creation of an Open Access Database for Linguistic Theories Based on an Integral Analysis of Elliptical Structures in English and Japanese
基于英日省略结构综合分析的语言学理论开放存取数据库的创建
- 批准号:
21H00532 - 财政年份:2021
- 资助金额:
$ 25.77万 - 项目类别:
Grant-in-Aid for Scientific Research (B)
Study on new accurate boundary integral equation based on consideration of the complex fictitious eigenfrequencies
基于复虚拟特征频率的新型精确边界积分方程研究
- 批准号:
19K20285 - 财政年份:2019
- 资助金额:
$ 25.77万 - 项目类别:
Grant-in-Aid for Early-Career Scientists
Defect inspection method based on Cauchy's integral theorem using a circular differential coherent illumination
基于柯西积分定理的圆形差分相干照明缺陷检测方法
- 批准号:
18K04172 - 财政年份:2018
- 资助金额:
$ 25.77万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Structure based drug design of novel anti-fungal agents imported into cells by the integral membrane transporter, UapA
通过整合膜转运蛋白 UapA 导入细胞的新型抗真菌药物的基于结构的药物设计
- 批准号:
2133377 - 财政年份:2018
- 资助金额:
$ 25.77万 - 项目类别:
Studentship
On full-swing nonlinear control based on integral viewpoint modeling and its application to large amplitude locomotion
基于积分视点建模的全摆幅非线性控制及其在大振幅运动中的应用
- 批准号:
16H04386 - 财政年份:2016
- 资助金额:
$ 25.77万 - 项目类别:
Grant-in-Aid for Scientific Research (B)
Development of a unified framework for optimal control and estimation of nonlinear stochastic systems based on path integral analysis
基于路径积分分析的非线性随机系统最优控制和估计统一框架的开发
- 批准号:
15K18089 - 财政年份:2015
- 资助金额:
$ 25.77万 - 项目类别:
Grant-in-Aid for Young Scientists (B)