EVIDENCE: computer-assisted interactive extraction of good dictionary examples from large corpora
证据:计算机辅助从大型语料库中交互式提取优秀的词典示例
基本信息
- 批准号:433249742
- 负责人:
- 金额:--
- 依托单位:
- 依托单位国家:德国
- 项目类别:Research data and software (Scientific Library Services and Information Systems)
- 财政年份:2019
- 资助国家:德国
- 起止时间:2018-12-31 至 2023-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
The project will bring together computer scientists and lexicographers in solving a lexicographical problem, i.e. the identification and extraction of good examples from a large set of corpus examples. Machine learning will be applied to help lexicographers in selecting good examples from corpora for inclusion in dictionary articles. The application of machine learning should facilitate the task of the lexicographers by ranking the examples according to their measured quality and therefore direct the attention of the lexicographers to the best examples. Since quality and appropriateness of examples from corpora are not well-defined features, unanimous judgment cannot be achieved even among professional lexicographers. With interactive learning, we plan to train an adaptive machine learning model on preferences which we assume are more unanimous for different lexicographers since it is more likely that they agree on example 1 being better than example 2 than agreeing on explicit scores for both examples. Furthermore, it is planned to acquire and integrate the judgment of dictionary users (i.e. informed lay persons) on sets of ranked good examples. The outcome of the project will be a system for the extraction, classification, and ranking of corpus examples. This system will initially be tested in the context of the DWDS. There it will support the lexicographers in their daily work. It is expected that for each headword the final system will present a set of good examples that are sufficiently diverse to illustrate various facets of the real use of this word. Furthermore, it will generate an additional value for non-expert dictionary users, as it will supply good examples also for headwords that have not yet received full lexical treatment. The new system will allow any user to provide feedback on the quality of examples which are used by the system to learn. E.g. in the context of teaching, students no longer only consume, but actively participate in the development of a lexicographic resource. The project will also organize workshops to acquire early adopters and to gather feedback from the community. Thus, the proposed method and its application will be useful for other dictionary projects as they are language independent and easy to integrate into current state-of-the-art systems for lexicography.
该项目将汇集计算机科学家和词典编纂者解决词典编纂问题,即从大量语料库示例中识别和提取好的示例。机器学习将被应用于帮助词典编纂者从语料库中选择好的例子纳入词典文章。机器学习的应用应该促进词典编纂者的任务,根据它们的测量质量对例子进行排名,从而将词典编纂者的注意力引导到最好的例子上。由于语料库中例句的质量和恰当性并不是一个很好的特征,即使是专业词典编纂者也无法做出一致的判断。 通过交互式学习,我们计划根据偏好训练自适应机器学习模型,我们假设这些偏好对于不同的词典编纂者来说更加一致,因为他们更有可能同意示例1比示例2更好,而不是同意两个示例的显式得分。此外,计划获取和整合字典用户(即知情的外行人)对排名良好的例子集的判断。该项目的成果将是一个用于语料库示例的提取、分类和排名的系统。这一系统最初将在妇女发展战略的范围内进行测试。在那里,它将支持词典编纂者的日常工作。预计最后的系统将为每个词目提供一组充分多样化的好例子,以说明该词真实的使用的各个方面。此外,它将为非专业词典用户产生额外的价值,因为它还将为尚未接受完整词汇处理的词目提供很好的示例。新系统将允许任何用户对系统用于学习的示例的质量提供反馈。例如,在教学中,学生不再只是消费,而是积极参与词典资源的开发。该项目还将组织讲习班,以获得早期采用者并收集社区的反馈。因此,所提出的方法及其应用程序将是有用的其他词典项目,因为它们是语言独立的,易于集成到当前的最先进的系统词典。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Privatdozent Dr. Alexander Geyken其他文献
Privatdozent Dr. Alexander Geyken的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Privatdozent Dr. Alexander Geyken', 18)}}的其他基金
The Evolution of Complex Text Patterns:Development and Application of a corpus-linguistic Approach for Analysis of Diachronic Change of Text Patterns in its Multidimensionality
复杂文本模式的演变:多维文本模式历时变化分析的语料库语言方法的开发和应用
- 批准号:
417702242 - 财政年份:2019
- 资助金额:
-- - 项目类别:
Research Grants
Digital Collection German Colonialism - Establishment of a digital collection of texts and integration in the research Infrastructure CLARIN-D
德国殖民主义数字馆藏 - 建立数字文本馆藏并整合到研究基础设施 CLARIN-D 中
- 批准号:
324473798 - 财政年份:2017
- 资助金额:
-- - 项目类别:
Cataloguing and Digitisation (Scientific Library Services and Information Systems)
相似国自然基金
基于多重计算全息片(Computer-generated Hologram,CGH)的光学非球面干涉绝对检验方法研究
- 批准号:62375132
- 批准年份:2023
- 资助金额:54.00 万元
- 项目类别:面上项目
Journal of Computer Science and Technology
- 批准号:61224001
- 批准年份:2012
- 资助金额:20.0 万元
- 项目类别:专项基金项目
普适计算环境下基于交互迁移与协作的智能人机交互研究
- 批准号:61003219
- 批准年份:2010
- 资助金额:7.0 万元
- 项目类别:青年科学基金项目
Journal of Computer Science and Technology
- 批准号:61040017
- 批准年份:2010
- 资助金额:4.0 万元
- 项目类别:专项基金项目
基于磷酸二酯酶IV结构的抑制剂的设计与动态组合合成
- 批准号:30500633
- 批准年份:2005
- 资助金额:26.0 万元
- 项目类别:青年科学基金项目
相似海外基金
Participatory System Dynamics vs Audit and Feedback: A Cluster Randomized Trial of Mechanisms of Implementation Change to Expand Reach of Evidence-based Addiction and Mental Health Care
参与式系统动态与审计和反馈:实施变更机制的集群随机试验,以扩大循证成瘾和心理健康保健的范围
- 批准号:
10314046 - 财政年份:2019
- 资助金额:
-- - 项目类别:
Participatory System Dynamics vs Audit and Feedback: A Cluster Randomized Trial of Mechanisms of Implementation Change to Expand Reach of Evidence-based Addiction and Mental Health Care
参与式系统动态与审计和反馈:实施变更机制的集群随机试验,以扩大循证成瘾和心理健康保健的范围
- 批准号:
10538553 - 财政年份:2019
- 资助金额:
-- - 项目类别:
Participatory System Dynamics vs Audit and Feedback: A Cluster Randomized Trial of Mechanisms of Implementation Change to Expand Reach of Evidence-based Addiction and Mental Health Care
参与式系统动态与审计和反馈:实施变更机制的集群随机试验,以扩大循证成瘾和心理健康保健的范围
- 批准号:
10066337 - 财政年份:2019
- 资助金额:
-- - 项目类别:
How Perceptions of Parties Evolve Following Leadership Change: Evidence from Panel data and Computer-assisted Content Analysis of News Coverage
领导层更迭后政党的看法如何演变:来自小组数据和计算机辅助新闻报道内容分析的证据
- 批准号:
2275450 - 财政年份:2019
- 资助金额:
-- - 项目类别:
Studentship
Tracking neurocognitive changes during evidence-based reading instruction in typically and atypically developing children
跟踪典型和非典型发育儿童的循证阅读教学期间的神经认知变化
- 批准号:
9384624 - 财政年份:2017
- 资助金额:
-- - 项目类别:
Tracking neurocognitive changes during evidence-based reading instruction in typically and atypically developing children
跟踪典型和非典型发育儿童的循证阅读教学期间的神经认知变化
- 批准号:
10207696 - 财政年份:2017
- 资助金额:
-- - 项目类别:
Tracking neurocognitive changes during evidence-based reading instruction in typically and atypically developing children
跟踪典型和非典型发育儿童的循证阅读教学期间的神经认知变化
- 批准号:
9981480 - 财政年份:2017
- 资助金额:
-- - 项目类别:
The Stages of Implementation Completion for Evidence-Based Practice
循证实践的实施完成阶段
- 批准号:
8771450 - 财政年份:2012
- 资助金额:
-- - 项目类别:
Development of Evidence Based Programs: Translational Interventions
循证计划的开发:转化干预
- 批准号:
8320545 - 财政年份:2012
- 资助金额:
-- - 项目类别:
The Stages of Implementation Completion for Evidence-Based Practice
循证实践的实施完成阶段
- 批准号:
8456954 - 财政年份:2012
- 资助金额:
-- - 项目类别: