Lexical Acquisition for the Biomedical Domain

生物医学领域的词汇习得

基本信息

  • 批准号:
    EP/G051070/1
  • 负责人:
  • 金额:
    $ 36.37万
  • 依托单位:
  • 依托单位国家:
    英国
  • 项目类别:
    Research Grant
  • 财政年份:
    2009
  • 资助国家:
    英国
  • 起止时间:
    2009 至 无数据
  • 项目状态:
    已结题

项目摘要

Natural Language Processing (NLP) is now critically needed to assist the processing, mining and extraction of knowledge from the rapidly growing literature in the area of biomedicine. In recent years, considerable progress has been made in the development of basic NLP techniques for biomedicine. The current challenge is to improve these techniques with richer and deeper analysis capable of supporting a wide range of real-world tasks. High-quality lexical resources (e.g. accurate and comprehensive lexicons and word classifications) are critically needed for this. Most lexical resources used in current systems are developed manually by linguists. Manual work is extremely costly, and the resulting resources require extensive labour-intensive porting to new (sub-)domains and tasks. Automatic acquisition or updating of lexical information from repositories of un-annotated text (e.g. corpora of biomedical articles) is a more promising avenue to pursue. Since lexical acquisition gathers usage and frequency information directly from relevant data, it can considerably enhance the viability and portability of NLP technology. Research into automatic lexical acquisition is now starting to produce large-scale resources useful for practical NLP tasks. However, the application of such techniques to biomedical texts has been limited because many existing techniques require adaptation before they can perform optimally in this linguistically challenging domain. In this project, we will take existing techniques capable of acquiring basic syntactic-semantic information for verbs from corpus data and will adapt them to the biomedical domain. We will focus on verbal (i) subcategorization frames, (ii) selectional preferences, and (ii) lexical-semantic classes. This information, when tailored to the domain in question, can aid key NLP tasks such as parsing, anaphora resolution, Information Extraction (IE), and question-answering (QA). Building on our pilot studies and expanding on the adaptive, state-of-the-art text processing tools available to us, we will improve existing techniques further and extend them with novel unsupervised and semi-supervised methods capable of supporting efficient domain adaptation. We will evaluate and demonstrate the capabilities of our techniques directly and in the context of practical BIO-NLP tasks. We will use the final version of the system to acquire a substantial lexical database from a biomedical corpus. The resulting resource will be distributed freely to the research community, along with the software which can be used to tune the frequency information stored in the database to particular biomedical sub-domains/tasks.We expect this project to (i) advance BIO-NLP and improve its usefulness for practical tasks in biomedicine, (ii) advance NLP by improving the accuracy, robustness and portability of lexical acquisition to real-world tasks, and (iii) provide an important large-scale study of domain-adaptation in the critical area of lexical acquisition.
现在迫切需要自然语言处理(NLP)来辅助处理、挖掘和提取生物医学领域中快速增长的文献中的知识。近年来,生物医学基础NLP技术的发展取得了长足的进步。当前的挑战是用更丰富和更深入的分析来改进这些技术,以便能够支持广泛的现实世界任务。为此,迫切需要高质量的词汇资源(如准确而全面的词典和单词分类)。当前系统中使用的大多数词汇资源都是由语言学家手动开发的。手工工作极其昂贵,由此产生的资源需要大量的劳动密集型移植到新的(子)域和任务。从未加注释的文本储存库(例如,生物医学文章语料库)自动获取或更新词汇信息是一种更有前途的途径。由于词汇习得直接从相关数据中收集使用情况和频率信息,因此可以极大地提高自然语言处理技术的可行性和可移植性。对词汇自动习得的研究现在已经开始产生对实际自然语言处理任务有用的大规模资源。然而,这些技术在生物医学文本中的应用一直是有限的,因为许多现有的技术需要适应,才能在这个具有语言学挑战性的领域取得最佳表现。在这个项目中,我们将采用现有的能够从语料库数据中获取动词的基本句法-语义信息的技术,并将它们应用于生物医学领域。我们将集中于动词(I)次范畴框架,(Ii)选择偏好,和(Ii)词汇-语义类别。当这些信息针对相关领域量身定做时,可以帮助执行关键的NLP任务,如解析、回指解析、信息提取(IE)和问答(QA)。在我们的初步研究和扩展现有的自适应、最先进的文本处理工具的基础上,我们将进一步改进现有的技术,并使用能够支持有效的领域适应的新的非监督和半监督方法来扩展它们。我们将直接评估和演示我们的技术能力,并在实际生物-NLP任务的背景下进行评估。我们将使用该系统的最终版本从生物医学语料库中获取大量的词汇数据库。我们期望这个项目能够(I)推进BIONAL-NLP并提高其在生物医学领域实际任务中的有用性,(Ii)通过提高词汇习得的准确性、稳健性和可移植性来促进NLP,以及(Iii)在词汇习得的关键领域中提供一项重要的大规模领域适应研究。

项目成果

期刊论文数量(10)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Using Argumentative Zones for Extractive Summarization of Scientific Articles
  • DOI:
  • 发表时间:
    2012-12
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Danish Contractor;Yufan Guo;A. Korhonen
  • 通讯作者:
    Danish Contractor;Yufan Guo;A. Korhonen
Latent Variable Models of Selectional Preference
  • DOI:
  • 发表时间:
    2010-07
  • 期刊:
  • 影响因子:
    1.2
  • 作者:
    Diarmuid Ó Séaghdha
  • 通讯作者:
    Diarmuid Ó Séaghdha
A comparison and user-based evaluation of models of textual information structure in the context of cancer risk assessment.
  • DOI:
    10.1186/1471-2105-12-69
  • 发表时间:
    2011-03-08
  • 期刊:
  • 影响因子:
    3
  • 作者:
    Guo Y;Korhonen A;Liakata M;Silins I;Hogberg J;Stenius U
  • 通讯作者:
    Stenius U
Active learning-based information structure analysis of full scientific articles and two applications for biomedical literature review
  • DOI:
    10.1093/bioinformatics/btt163
  • 发表时间:
    2013-06
  • 期刊:
  • 影响因子:
    5.8
  • 作者:
    Yufan Guo;Ilona Silins;U. Stenius;A. Korhonen
  • 通讯作者:
    Yufan Guo;Ilona Silins;U. Stenius;A. Korhonen
Text mining for literature review and knowledge discovery in cancer risk assessment and research.
  • DOI:
    10.1371/journal.pone.0033427
  • 发表时间:
    2012
  • 期刊:
  • 影响因子:
    3.7
  • 作者:
    Korhonen A;Séaghdha DO;Silins I;Sun L;Högberg J;Stenius U
  • 通讯作者:
    Stenius U
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Anna Korhonen其他文献

Culturally Aware and Adapted NLP: A Taxonomy and a Survey of the State of the Art
文化意识和适应的 NLP:分类法和现有技术的调查
  • DOI:
  • 发表时间:
    2024
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Chen Cecilia Liu;Iryna Gurevych;Anna Korhonen
  • 通讯作者:
    Anna Korhonen
Automatic Classification of Verbs in Biomedical Texts
生物医学文本中动词的自动分类
  • DOI:
  • 发表时间:
    2006
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Anna Korhonen;Yuval Krymolowski;Nigel Collier
  • 通讯作者:
    Nigel Collier
LexSchem: a Large Subcategorization Lexicon for French Verbs
LexSchem:法语动词大型子分类词典
Exposing Cross-Lingual Lexical Knowledge from Multilingual Sentence Encoders
从多语言句子编码器中揭示跨语言词汇知识
  • DOI:
    10.48550/arxiv.2205.00267
  • 发表时间:
    2022
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Ivan Vulic;Goran Glavas;Fangyu Liu;Nigel Collier;E. Ponti;Anna Korhonen
  • 通讯作者:
    Anna Korhonen
Fairer Preferences Elicit Improved Human-Aligned Large Language Model Judgments
更公平的偏好可以改善与人类一致的大型语言模型判断
  • DOI:
  • 发表时间:
    2024
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Han Zhou;Xingchen Wan;Yinhong Liu;Nigel Collier;Ivan Vuli'c;Anna Korhonen
  • 通讯作者:
    Anna Korhonen

Anna Korhonen的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Anna Korhonen', 18)}}的其他基金

Towards Globally Equitable Language Technologies (EQUATE)
迈向全球公平的语言技术 (EQUATE)
  • 批准号:
    EP/Y031350/1
  • 财政年份:
    2023
  • 资助金额:
    $ 36.37万
  • 项目类别:
    Research Grant
Literature-based discovery for cancer biology
基于文献的癌症生物学发现
  • 批准号:
    MR/M013049/1
  • 财政年份:
    2015
  • 资助金额:
    $ 36.37万
  • 项目类别:
    Research Grant
Using Text Mining to Aid Cancer Risk Assessment
使用文本挖掘辅助癌症风险评估
  • 批准号:
    G0601766/1
  • 财政年份:
    2007
  • 资助金额:
    $ 36.37万
  • 项目类别:
    Research Grant

相似海外基金

Acquisition of the Agilent Cytation C10 confocal imaging reader for enhancing biomedical research excellence at Clemson University
采购 Agilent Cytation C10 共焦成像阅读器,以提高克莱姆森大学的生物医学研究卓越性
  • 批准号:
    10798537
  • 财政年份:
    2022
  • 资助金额:
    $ 36.37万
  • 项目类别:
Acquisition of a Dual-Source, High-Performance, Ion Mobility, Quadrupole Time-of-Flight Mass Spectrometry System for Biomedical Research at UW-Madison
威斯康辛大学麦迪逊分校采购双源、高性能、离子淌度、四极杆飞行时间质谱系统用于生物医学研究
  • 批准号:
    10177384
  • 财政年份:
    2021
  • 资助金额:
    $ 36.37万
  • 项目类别:
MRI: Acquisition of a State-of-the-Art Analytical Ultracentrifuge for Biomedical and Materials Research
MRI:购买最先进的分析超速离心机用于生物医学和材料研究
  • 批准号:
    2018942
  • 财政年份:
    2020
  • 资助金额:
    $ 36.37万
  • 项目类别:
    Standard Grant
MRI: Acquisition of a Single Crystal X-ray Diffractometer to Support Research from Fundamental Chemistry to Functional Materials and Biomedical Applications
MRI:购买单晶 X 射线衍射仪以支持从基础化学到功能材料和生物医学应用的研究
  • 批准号:
    2018414
  • 财政年份:
    2020
  • 资助金额:
    $ 36.37万
  • 项目类别:
    Standard Grant
MRI: Track1 Acquisition of a Confocal High Content Screening System to Enhance Bioengineering and Biomedical Research.
MRI:Track1 采购共焦高内涵筛查系统以增强生物工程和生物医学研究。
  • 批准号:
    1828057
  • 财政年份:
    2018
  • 资助金额:
    $ 36.37万
  • 项目类别:
    Standard Grant
MRI: Acquisition of a Force-measuring treadmill for biomedical experimentation and assistive device development
MRI:购买测力跑步机用于生物医学实验和辅助设备开发
  • 批准号:
    1625163
  • 财政年份:
    2016
  • 资助金额:
    $ 36.37万
  • 项目类别:
    Standard Grant
MRI: Acquisition of A Multiphoton Confocal Laser Scanning Microscope for Life Science and Biomedical Research and Training at SUNY Binghamton
MRI:在纽约州立大学宾厄姆顿分校购买多光子共焦激光扫描显微镜,用于生命科学和生物医学研究和培训
  • 批准号:
    1531944
  • 财政年份:
    2015
  • 资助金额:
    $ 36.37万
  • 项目类别:
    Standard Grant
Acquisition of a Surface Plasmon Resonance Biosensor: Enhancing Biomedical (Bio)Molecular Recognition Projects
获得表面等离子共振生物传感器:增强生物医学(生物)分子识别项目
  • 批准号:
    8826435
  • 财政年份:
    2015
  • 资助金额:
    $ 36.37万
  • 项目类别:
MRI: Acquisition of a Fluorescence Activated Cell Sorter for Biomedical & Bioscience Research and Training at University of Arkansas
MRI:获取用于生物医学的荧光激活细胞分选仪
  • 批准号:
    1337265
  • 财政年份:
    2013
  • 资助金额:
    $ 36.37万
  • 项目类别:
    Standard Grant
MRI: Acquisition of Micro Particle Image Velocimetry (micro-PIV) System for Microfluidics and Biomedical Applications
MRI:采集用于微流体和生物医学应用的微颗粒图像测速 (micro-PIV) 系统
  • 批准号:
    1338008
  • 财政年份:
    2013
  • 资助金额:
    $ 36.37万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了