EVIDENCE: computer-assisted interactive extraction of good dictionary examples from large corpora
证据:计算机辅助从大型语料库中交互式提取优秀的词典示例
基本信息
- 批准号:433249742
- 负责人:
- 金额:--
- 依托单位:
- 依托单位国家:德国
- 项目类别:Research data and software (Scientific Library Services and Information Systems)
- 财政年份:2019
- 资助国家:德国
- 起止时间:2018-12-31 至 2023-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
The project will bring together computer scientists and lexicographers in solving a lexicographical problem, i.e. the identification and extraction of good examples from a large set of corpus examples. Machine learning will be applied to help lexicographers in selecting good examples from corpora for inclusion in dictionary articles. The application of machine learning should facilitate the task of the lexicographers by ranking the examples according to their measured quality and therefore direct the attention of the lexicographers to the best examples. Since quality and appropriateness of examples from corpora are not well-defined features, unanimous judgment cannot be achieved even among professional lexicographers. With interactive learning, we plan to train an adaptive machine learning model on preferences which we assume are more unanimous for different lexicographers since it is more likely that they agree on example 1 being better than example 2 than agreeing on explicit scores for both examples. Furthermore, it is planned to acquire and integrate the judgment of dictionary users (i.e. informed lay persons) on sets of ranked good examples. The outcome of the project will be a system for the extraction, classification, and ranking of corpus examples. This system will initially be tested in the context of the DWDS. There it will support the lexicographers in their daily work. It is expected that for each headword the final system will present a set of good examples that are sufficiently diverse to illustrate various facets of the real use of this word. Furthermore, it will generate an additional value for non-expert dictionary users, as it will supply good examples also for headwords that have not yet received full lexical treatment. The new system will allow any user to provide feedback on the quality of examples which are used by the system to learn. E.g. in the context of teaching, students no longer only consume, but actively participate in the development of a lexicographic resource. The project will also organize workshops to acquire early adopters and to gather feedback from the community. Thus, the proposed method and its application will be useful for other dictionary projects as they are language independent and easy to integrate into current state-of-the-art systems for lexicography.
该项目将汇集计算机科学家和词典学家,以解决词典学问题,即从大量语料库示例中识别和提取良好的例子。机器学习将用于帮助词典学家从Corpora中选择典型的典范,以包含在字典文章中。机器学习的应用应通过根据示例根据测量质量进行排名,从而促进词典学家的任务,从而将词典学家的注意力转移到最佳示例上。由于来自CORPORA的实例的质量和适当性不是定义明确的特征,因此即使在专业词典学家中也无法实现一致的判断。 借助互动学习,我们计划培训一种自适应机学习模型,以对不同的词典学家来说更一致,因为他们更有可能在示例1上同意示例1比同意两个示例的明确分数要好。此外,计划在排名良好的典范中获取和整合字典用户(即知情的外行人)的判断。该项目的结果将是用于提取,分类和等级示例的系统。该系统最初将在DWD的上下文中进行测试。在那里,它将在日常工作中为词典学者提供支持。可以预期,对于每个头词,最终系统将提供一组好的示例,这些示例足以说明该词真正使用的各个方面。此外,它将为非专业词典用户产生额外的价值,因为它将为尚未获得全面词汇处理的headwords提供良好的示例。新系统将允许任何用户提供有关系统使用的示例质量的反馈。例如。在教学的背景下,学生不再只消费,而是积极参与词典资源的发展。该项目还将组织研讨会,以获取早期采用者并收集社区的反馈。因此,拟议的方法及其应用将对其他字典项目有用,因为它们独立于语言且易于集成到当前的词典最新系统中。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Privatdozent Dr. Alexander Geyken其他文献
Privatdozent Dr. Alexander Geyken的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Privatdozent Dr. Alexander Geyken', 18)}}的其他基金
The Evolution of Complex Text Patterns:Development and Application of a corpus-linguistic Approach for Analysis of Diachronic Change of Text Patterns in its Multidimensionality
复杂文本模式的演变:多维文本模式历时变化分析的语料库语言方法的开发和应用
- 批准号:
417702242 - 财政年份:2019
- 资助金额:
-- - 项目类别:
Research Grants
Digital Collection German Colonialism - Establishment of a digital collection of texts and integration in the research Infrastructure CLARIN-D
德国殖民主义数字馆藏 - 建立数字文本馆藏并整合到研究基础设施 CLARIN-D 中
- 批准号:
324473798 - 财政年份:2017
- 资助金额:
-- - 项目类别:
Cataloguing and Digitisation (Scientific Library Services and Information Systems)
相似国自然基金
基于多重计算全息片(Computer-generated Hologram,CGH)的光学非球面干涉绝对检验方法研究
- 批准号:62375132
- 批准年份:2023
- 资助金额:54.00 万元
- 项目类别:面上项目
植物病毒的微流控芯片ELISA智能便携平台测定方法研究
- 批准号:21505061
- 批准年份:2015
- 资助金额:21.0 万元
- 项目类别:青年科学基金项目
不同精神压力与身体负荷对电脑工作相关颈痛的成因机制研究
- 批准号:81472155
- 批准年份:2014
- 资助金额:61.0 万元
- 项目类别:面上项目
电脑豁达治疗对肺癌的康复作用及其脑代谢机制研究
- 批准号:81372488
- 批准年份:2013
- 资助金额:65.0 万元
- 项目类别:面上项目
Journal of Computer Science and Technology
- 批准号:61224001
- 批准年份:2012
- 资助金额:20.0 万元
- 项目类别:专项基金项目
相似海外基金
Integrating the Youth Nominated Support Team (YST) with CBT for Black Youth with Acute Suicide Risk
将青年提名支持团队 (YST) 与针对有急性自杀风险的黑人青年的 CBT 相结合
- 批准号:
10573542 - 财政年份:2023
- 资助金额:
-- - 项目类别:
Targeting brain and bone metastases in metastatic breast cancer for improved patient survival
针对转移性乳腺癌的脑和骨转移,提高患者生存率
- 批准号:
10564604 - 财政年份:2023
- 资助金额:
-- - 项目类别:
Real World Adoption of an OUD Digital Health Therapeutic
OUD 数字健康疗法在现实世界中的采用
- 批准号:
10741217 - 财政年份:2023
- 资助金额:
-- - 项目类别:
[R21] Integrated computer-aided, point-of-care ultrasound for tuberculosis screening
[R21] 用于结核病筛查的集成计算机辅助床旁超声
- 批准号:
10511853 - 财政年份:2022
- 资助金额:
-- - 项目类别: