The Probabilistic Representation of Linguistic Knowledge
语言知识的概率表示
基本信息
- 批准号:ES/J022969/1
- 负责人:
- 金额:$ 58.46万
- 依托单位:
- 依托单位国家:英国
- 项目类别:Fellowship
- 财政年份:2012
- 资助国家:英国
- 起止时间:2012 至 无数据
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
In the past twenty-five years work in natural language technology has made impressive progress across a wide range of tasks, which include, among others, information retrieval and extraction, text interpretation and summarization, speech recognition, morphological analysis, syntactic parsing, word sense identification, and machine translation. Much of this progress has been due to the successful application of powerful techniques for probabilistic modeling and statistical analysis to large corpora of linguistic data. These methods have given rise to a set of engineering tools that are rapidly shaping the digital environment in which we access and process most of the information that we use. In recent work (Lappin and Shieber (2007), Clark and Lappin (2011a), Clark and Lappin (2011b)) my co-authors and I have argued that the machine learning methods that are driving the expansion of natural language technology are also directly relevant to understanding central features of human language acquisition. When these methods are used to construct carefully specified formal models and implementations of the grammar induction task, they yield striking insights into the limits and possibility of human learning on the basis of the primary linguistic data to which children are exposed. These models indicate that language learning can be achieved without the sorts of strong innate learning biases that have been posited by traditional theories of universal grammar. Weak biases, some derivable from non-linguistic cognitive domains, and domain general learning procedures are sufficient to support efficient data driven learning of plausible systems of grammatical representation.In the current research I am focussing on the problem of how to specify the class of representations that encode human knowledge of the syntax of natural languages. I am pursuing the hypothesis that a representation in this class is best expressed as an enriched statistical language model that assigns probability values to the sentences of a language. A central part of the enrichment of the model consists of a procedure for determining the acceptability (grammaticality) of a sentence as a graded value, relative to the properties of that sentence and the language of which it is a part. This procedure avoids the simple reduction of the grammaticality of a string to its estimated probability of occurrence, while still characterizing grammaticality in probabilistic terms. An enriched model of this kind will provide a straightforward explanation for the fact that individual native speakers generally judge the well formedness of sentences along a continuum, rather than through the imposition of a sharp boundary between acceptable and unacceptable sentences. The pervasiveness of gradedness in the linguistic knowledge of individual speakers poses a serious problem for classical theories of syntax, which partition strings of words into the grammatical sentences of a language and ill formed strings of words. This research holds out the prospect of important impact in two areas. First, it can shed light on the relationship between the representation and acquisition of linguistic knowledge on one hand, and learning and the encoding of knowledge in other cognitive domains. This work can, in turn, help to clarify the respective roles of biologically conditioned learning biases and data driven learning in human cognition. Second, this work can contribute to the development of more effective language technology by providing insight, from a computational perspective, into the way in which humans represent the syntactic properties of sentences in their language. To the extent that natural language processing systems take account of this class of representations they will provide more efficient tools for parsing and interpreting text and speech.
在过去的25年里,自然语言技术在广泛的任务方面取得了令人印象深刻的进展,其中包括信息检索和提取,文本解释和摘要,语音识别,形态分析,句法分析,词义识别和机器翻译。这一进展主要归功于将强大的概率建模和统计分析技术成功应用于大型语言数据语料库。这些方法催生了一系列工程工具,这些工具正在迅速塑造我们访问和处理大部分信息的数字环境。在最近的工作中(Lappin和Shieber(2007),Clark和Lappin(2011 a),Clark和Lappin(2011 b)),我和我的合著者认为,推动自然语言技术扩展的机器学习方法也与理解人类语言习得的核心特征直接相关。当这些方法被用来构建精心指定的正式模型和实现的语法归纳任务,他们产生惊人的洞察力的限制和人类学习的基础上的主要语言数据的儿童接触的可能性。这些模型表明,语言学习可以在没有传统普遍语法理论所假设的那种强烈的先天学习偏见的情况下实现。弱偏见,一些来自非语言的认知域,域一般的学习程序足以支持有效的数据驱动学习的似是而非的系统的语法representation.In目前的研究,我专注于如何指定类的表示编码人类知识的自然语言的语法的问题。我追求的假设,在这类表示是最好的表达为丰富的统计语言模型,分配概率值的句子的语言。一个核心部分的丰富的模型包括一个程序,用于确定可接受性(语法)的句子作为一个分级值,相对于该句子的属性和语言的一部分。这个过程避免了简单地将字符串的语法性简化为其估计的出现概率,同时仍然以概率术语表征语法性。这种丰富的模型将提供一个简单的解释,即个别母语者通常沿着连续体判断句子的良好结构,而不是通过在可接受和不可接受的句子之间强加一个清晰的界限。等级在个体说话者的语言知识中的普遍存在给经典的句法理论带来了严重的问题,经典的句法理论将单词串划分为一种语言的语法句子和不规则的单词串。本研究在两个方面展示了重要影响的前景。首先,它可以揭示语言知识的表征和获得与其他认知领域知识的学习和编码之间的关系。这项工作反过来可以帮助澄清生物条件学习偏差和数据驱动学习在人类认知中的各自作用。第二,这项工作可以通过从计算的角度提供洞察力来促进更有效的语言技术的发展,从而了解人类在其语言中表示句子的句法属性的方式。在某种程度上,自然语言处理系统考虑到这类表示,它们将提供更有效的工具来解析和解释文本和语音。
项目成果
期刊论文数量(10)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Towards a Statistical Model of Grammaticality, Proceedings of the 35th Annual Conference of the Cognitive Science Society, Berlin, July-August 2013, pp. 2064-2069.
《走向语法统计模型》,认知科学学会第 35 届年会论文集,柏林,2013 年 7 月至 8 月,第 2064-2069 页。
- DOI:
- 发表时间:2013
- 期刊:
- 影响因子:0
- 作者:Alexander Clark, Gianluca Giorgolo,;Shalom Lappin
- 通讯作者:Shalom Lappin
Jey Han Lau, Alexander Clark, and Shalom Lappin, Predicting Acceptability Judgements with Unsupervised Language Models, Proceedings of the Israeli Seminar in Computational Linguistics, Open University of Israel, June 2015.
Jey Han Lau、Alexander Clark 和 Shalom Lappin,用无监督语言模型预测可接受性判断,以色列计算语言学研讨会论文集,以色列开放大学,2015 年 6 月。
- DOI:
- 发表时间:2015
- 期刊:
- 影响因子:0
- 作者:Jey Han Lau
- 通讯作者:Jey Han Lau
Machine Reading Tea Leaves: Automatically Evaluating Topic Coherence and Topic Model Quality, Proceedings of EACL 2014, Gothenburg, pp. 530-539.
机器阅读茶叶:自动评估主题连贯性和主题模型质量,EACL 2014 年论文集,哥德堡,第 530-539 页。
- DOI:
- 发表时间:2014
- 期刊:
- 影响因子:0
- 作者:Jey Han Lau, David Newman,;Timothy Baldwin
- 通讯作者:Timothy Baldwin
Statistical Representation of Grammaticality Judgements: The Limits of N-Gram Models, Proceedings of the ACL Workshop on Cognitive Modelling and Computational Linguistics, Sophia, August 2013, pp. 28-36.
语法判断的统计表示:N-Gram 模型的局限性,ACL 认知建模和计算语言学研讨会论文集,Sophia,2013 年 8 月,第 28-36 页。
- DOI:
- 发表时间:2013
- 期刊:
- 影响因子:0
- 作者:Alexander Clark, Gianluca Giorgolo,;Shalom Lappin
- 通讯作者:Shalom Lappin
Jey Han Lau, Alexander Clark, and Shalom Lappin, Grammaticality, Acceptability, and Probability: A Probabilistic View of Linguistic Knowledge
Jey Han Lau、Alexander Clark 和 Shalom Lappin,语法性、可接受性和概率:语言知识的概率观
- DOI:
- 发表时间:2016
- 期刊:
- 影响因子:2.5
- 作者:Lau, J.H.
- 通讯作者:Lau, J.H.
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Shalom Lappin其他文献
An Intensional Parametric Semantics For Vague Quantifiers
模糊量词的内涵参数语义
- DOI:
10.1023/a:1005638918877 - 发表时间:
2000 - 期刊:
- 影响因子:1.1
- 作者:
Shalom Lappin - 通讯作者:
Shalom Lappin
Deep Learning and Linguistic Representation
- DOI:
10.1201/9781003127086 - 发表时间:
2021-03 - 期刊:
- 影响因子:0
- 作者:
Shalom Lappin - 通讯作者:
Shalom Lappin
Presuppositional effects of strong determiners: a processing account
强决定因素的预设效应:处理帐户
- DOI:
10.1515/ling.1988.26.6.1021 - 发表时间:
1988 - 期刊:
- 影响因子:1.1
- 作者:
Shalom Lappin;T. Reinhart - 通讯作者:
T. Reinhart
Statistical Representation of Grammaticality Judgements: the Limits of N-Gram Models
语法判断的统计表示:N-Gram 模型的局限性
- DOI:
- 发表时间:
2013 - 期刊:
- 影响因子:0
- 作者:
Alexander Clark;Gianluca Giorgolo;Shalom Lappin - 通讯作者:
Shalom Lappin
Dominance and modularity
主导地位和模块化
- DOI:
10.1515/ling.1987.25.4.671 - 发表时间:
1987 - 期刊:
- 影响因子:0
- 作者:
Nomi Erteschik;Shalom Lappin - 通讯作者:
Shalom Lappin
Shalom Lappin的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
相似海外基金
Linguistic Analysis of Self-representation in Multinational Corporations: A comparison of leading automobile multinationals
跨国公司自我表述的语言分析:领先汽车跨国公司的比较
- 批准号:
23K01591 - 财政年份:2023
- 资助金额:
$ 58.46万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
A cross-linguistic approach to constraints on lexical representation and grammatical realization of the concepts of "possession", "participation", and "experience"
一种跨语言方法来限制“占有”、“参与”和“体验”概念的词汇表示和语法实现
- 批准号:
22K00555 - 财政年份:2022
- 资助金额:
$ 58.46万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Discrimination and Linguistic Representation in Modern Society
现代社会的歧视和语言表征
- 批准号:
22K00874 - 财政年份:2022
- 资助金额:
$ 58.46万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Linguistic representation of space and motion in rGyalrongic
嘉隆语中空间和运动的语言表达
- 批准号:
21F20303 - 财政年份:2021
- 资助金额:
$ 58.46万 - 项目类别:
Grant-in-Aid for JSPS Fellows
Study on linguistic representation and identification of phones from speech imagery EEG.
语音图像脑电图的语言表示和音素识别研究。
- 批准号:
20K11910 - 财政年份:2020
- 资助金额:
$ 58.46万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Cross-Linguistic Studies on Lexical Differences based on Representation Learning
基于表征学习的跨语言词汇差异研究
- 批准号:
18K11456 - 财政年份:2018
- 资助金额:
$ 58.46万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Speech representation - A literary and linguistic corpus study
语音表征 - 文学和语言语料库研究
- 批准号:
322751860 - 财政年份:2016
- 资助金额:
$ 58.46万 - 项目类别:
Research Grants
RI: Medium: Broad-Coverage Semantic Parsing: Linguistic Representation Learning from Crowd-Scale Data
RI:中:广泛覆盖的语义解析:从人群规模数据中学习语言表示
- 批准号:
1562364 - 财政年份:2016
- 资助金额:
$ 58.46万 - 项目类别:
Continuing Grant
Doctoral Dissertation Research: A cross-linguistic study of the cognitive representation of part-whole structures in Mesoamerican languages
博士论文研究:中美洲语言部分整体结构认知表征的跨语言研究
- 批准号:
1354277 - 财政年份:2014
- 资助金额:
$ 58.46万 - 项目类别:
Standard Grant
CAREER: Integrating Perceptual and Linguistic Information in Models of Semantic Representation
职业:将感知和语言信息整合到语义表示模型中
- 批准号:
1056744 - 财政年份:2011
- 资助金额:
$ 58.46万 - 项目类别:
Continuing Grant