Corpus linguistic methods

语料库语言方法

基本信息

项目摘要

Project Pc is both an infrastructure and a research project within RUEG2. It is the successor to project Pd in RUEG1. On the side of infrastructure and support, it will continuously provide integration of new and/or corrected annotations, data curation and sustainability, as well as technical support and research engineering, i.e. the improvement of automatic and semi-automatic annotation of non-standard data across two modalities, and more generally the development of tools and pipelines for information retrieval/text mining and quantitative analysis. It will also provide support and consultation in the choice and application of quantitative research methods for projects P8-P11 in RUEG2.On the research side, it aims to advance the field of corpus linguistics in two ways: (1) through an evaluation of advanced machine learning techniques and the feasibility and usefulness of their application for the automatic and semi-automatic annotation and information retrieval in non-standard corpora of limited size; and (2) through a focus on the development, validation, evaluation, and epistemological embedding of methods for the RUEG corpus specifically, as well as small and mid-sized corpora in general. The RUEG corpus, being a mid-sized corpus and very well controlled in terms of topic, structure, setting, participants‘ backgrounds, and enriched with ample metadata, offers the chance to deeply understand, annotate, and analyze the full data set in a collaborative effort of the whole research group. It is in fact one of the few corpora that allow for variationist analyses across samples from different production situations and modes, speaker groups, age groups, and two languages recorded for each speaker. However, the trade-off for capturing this complexity lies in the diminished sample size for each group, which does not typically reach representativity as it would be required for frequentist statistics. Since there is no existing set of quantitative techniques that beyond reasonable doubt yield reliable results for smaller corpora, methodological development is crucial to the quantitative study of the RUEG data. At the same time, RUEG is unusually well-suited as a testing field for the evaluation of methods. It thus provides exceptionally synergetic potential for the development of corpus-linguistic methods overall. Pc will investigate and evaluate several promising techniques: a) The applicability (including the validity, reliability, and explanatory power) of mixed-effect models (MEMs), b) two frameworks that are currently almost unused in core-linguistics, graph theory or network analysis and Bayesian statistics, but show promising results in other quantitative fields; and c) the application of machine learning techniques for knowledge gain (rather than text mining objectives, as it is currently mainly used in computational linguistics).
Project Pc是RUEG 2中的基础设施和研究项目。它是RUEG 1项目Pd的后继项目。在基础设施和支持方面,它将继续提供新的和(或)更正的注释、数据管理和可持续性的整合,以及技术支持和研究工程,即改进两种模式的非标准数据的自动和半自动注释,更广泛地说,开发信息检索/文本挖掘和定量分析的工具和管道。它也将为RUEG 2项目P8-P11的定量研究方法的选择和应用提供支持和咨询。在研究方面,它旨在从两个方面推动语料库语言学领域的发展:(1)通过评估先进的机器学习技术及其在非计算机领域的自动和半自动标注和信息检索中应用的可行性和有用性,有限大小的标准语料库;和(2)通过专注于开发,验证,评估和认识论嵌入的方法,特别是RUEG语料库,以及一般的中小型语料库。RUEG语料库是一个中等规模的语料库,在主题,结构,设置,参与者的背景方面得到了很好的控制,并丰富了丰富的元数据,提供了深入理解,注释和分析整个研究小组合作努力的完整数据集的机会。事实上,它是为数不多的允许对来自不同生产情况和模式、说话者群体、年龄群体以及为每个说话者记录的两种语言的样本进行变异分析的语料库之一。然而,捕获这种复杂性的权衡在于每个组的样本量减少,这通常不会达到频率统计所需的代表性。由于没有一套现有的定量技术,超出合理怀疑产生可靠的结果,较小的语料库,方法的发展是至关重要的定量研究的RUEG数据。同时,RUEG非常适合作为方法评估的测试领域。因此,它为语料库语言学方法的整体发展提供了特别的协同潜力。PC将调查和评估几种有前途的技术:a)适用性(包括有效性、可靠性和解释力),B)两个框架,目前在核心语言学、图论或网络分析和贝叶斯统计中几乎没有使用,但在其他定量领域显示出有希望的结果;以及c)用于知识获取的机器学习技术的应用(而不是文本挖掘目标,因为它目前主要用于计算语言学)。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Professorin Dr. Anke Lüdeling其他文献

Professorin Dr. Anke Lüdeling的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Professorin Dr. Anke Lüdeling', 18)}}的其他基金

Crosslingual Language Varieties: A Multifaceted Investigation
跨语言语言品种:多方面的调查
  • 批准号:
    398186468
  • 财政年份:
    2018
  • 资助金额:
    --
  • 项目类别:
    Research Grants
Pd: “Emerging Grammars”: a cross-linguistic corpus of comparative data in heritage and majority language use
Pd:“新兴语法”:遗产和主流语言使用比较数据的跨语言语料库
  • 批准号:
    394844736
  • 财政年份:
    2018
  • 资助金额:
    --
  • 项目类别:
    Research Units
What's hard in German? - Corpus-based analysis of structural learner difficulties in German as a foreign language
德语有什么难的?
  • 批准号:
    105286979
  • 财政年份:
    2009
  • 资助金额:
    --
  • 项目类别:
    Research Grants

相似海外基金

Applying an equity and diversity lens to understand the care experiences and healthcare outcomes of low income and linguistic minority groups in Ontario retirement homes: A mixed methods study
应用公平和多样性的视角来了解安大略省养老院中低收入和语言少数群体的护理体验和医疗保健结果:一项混合方法研究
  • 批准号:
    484613
  • 财政年份:
    2023
  • 资助金额:
    --
  • 项目类别:
    Fellowship Programs
Improving linguistic health equity in prehospital emergency care
改善院前急救护理中的语言健康公平
  • 批准号:
    10786657
  • 财政年份:
    2023
  • 资助金额:
    --
  • 项目类别:
Deepening linguistic analysis methods for understanding and utilizing real documents
深化理解和利用真实文档的语言分析方法
  • 批准号:
    22K19818
  • 财政年份:
    2022
  • 资助金额:
    --
  • 项目类别:
    Grant-in-Aid for Challenging Research (Exploratory)
Advancing Computational Linguistic Biomarkers of Disorganized Speech in Psychosis
推进精神病言语混乱的计算语言生物标志物
  • 批准号:
    10686264
  • 财政年份:
    2022
  • 资助金额:
    --
  • 项目类别:
Early detection and monitoring of Alzheimers Disease and Related Dementias using non-semantic linguistic and acoustic features of speech derived from hearing aids
使用助听器语音的非语义语言和声学特征早期检测和监测阿尔茨海默病和相关痴呆症
  • 批准号:
    10600233
  • 财政年份:
    2022
  • 资助金额:
    --
  • 项目类别:
Promoting Linguistic and Cultural Identity through Bilingual Children's Stories to Address Nutrition and Health in Indigenous Communities
通过双语儿童故事促进语言和文化认同,解决土著社区的营养和健康问题
  • 批准号:
    10484677
  • 财政年份:
    2022
  • 资助金额:
    --
  • 项目类别:
Advancing Computational Linguistic Biomarkers of Disorganized Speech in Psychosis
推进精神病言语混乱的计算语言生物标志物
  • 批准号:
    10507015
  • 财政年份:
    2022
  • 资助金额:
    --
  • 项目类别:
Developing Novel Linguistic Analytic Methods to Optimize Relationship Quality and Equity in HIV Care
开发新的语言分析方法以优化艾滋病毒护理中的关系质量和公平性
  • 批准号:
    10410565
  • 财政年份:
    2021
  • 资助金额:
    --
  • 项目类别:
Automation of Neuro-Linguistic Programming methods using Artificial Intelligence to interpret unconscious thoughts and values for personal development
使用人工智能的神经语言编程方法的自动化来解释无意识的想法和价值观以促进个人发展
  • 批准号:
    10011174
  • 财政年份:
    2021
  • 资助金额:
    --
  • 项目类别:
    Feasibility Studies
Linguistic and Cultural Adaptation of Web-based Partner Violence Screening and Safety Planning Applications
基于网络的伴侣暴力筛查和安全规划应用的语言和文化适应
  • 批准号:
    459290
  • 财政年份:
    2021
  • 资助金额:
    --
  • 项目类别:
    Operating Grants
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了