Beyond Abstracts: Issues in Mining Full Texts
超越摘要:挖掘全文的问题
基本信息
- 批准号:7287359
- 负责人:
- 金额:$ 35.06万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2006
- 资助国家:美国
- 起止时间:2006-09-15 至 2009-09-14
- 项目状态:已结题
- 来源:
- 关键词:AdoptionAffectAgreementArtsBiomedical ResearchBody of uterusCollectionComputational TechniqueDataDevelopmentEvaluationGoldGrowthHumanJudgmentLanguageLeadLiteratureMachine LearningMemoryMethodsMetricMiningModelingMolecular BiologyNumbersPatternPeer ReviewPerformancePlayProcessPublic HealthPublishingRangeRepresentations, Knowledge (Computer)ResearchRetrievalReview LiteratureRoleSamplingSchemeScientistStandards of Weights and MeasuresSystemTechniquesTechnologyTestingTextTrainingWorkabstractingbaseconceptimprovedinformation organizationinterestjournal articlelanguage processingnovel strategiesprototypesizetechnology developmenttext searchingtool
项目摘要
DESCRIPTION (provided by applicant):
Biomedical language processing, the application of computational techniques to human-generated texts in biomedicine, is an increasingly important enabling technology for basic and applied biomedical research. The exponential growth of the peer-reviewed literature and the breakdown of disciplinary boundaries associated with high-throughput techniques have increased the importance of automated tools for keeping scientists abreast of all of the published material relevant to their work. However, despite decades of research, the performance of state-of-the-art tools for basic language processing tasks like information extraction and document retrieval remain below the level necessary for adequate utility and widespread adoption of this technology. The development, performance and evaluation of text mining systems depend crucially on the availability of appropriate corpora: collections of representative documents that have been annotated with human judgments relevant to a language-processing task. Corpora play two roles in the development of this technology: first, they act as "gold standards" by which alternative automated methods can be fairly compared, and second, they provide data for the training of statistical and machine learning systems that create empirical models of patterns in language use. The conventional view is that corpora are neutral, random samples of the domain of interest. Our preliminary work suggests that the restrictions in size, quality, genre, and representational schema of the small number of existing corpora are themselves a critical limiting factor for near-term breakthroughs in biomedical text processing technology. Therefore, we propose to test the following hypothesis: Creation of large, high-quality, biomedical corpora from multiple genres will lead to significant improvements in the performance of biomedical text mining systems and the creation of new approaches to text mining tasks. Specific aims include constructing several large corpora covering a range of genres and incorporating a rich knowledge representation; identifying factors that affect differential performance on full text versus abstracts; and developing new methods for language processing, especially of full text. Because improvements in the ability to automatically extract information from many textual genres will assist scientists and clinicians in the crucial task of keeping up with the burgeoning biomedical literature, the potential public health impact is quite large.
描述(由申请人提供):
生物医学语言处理是将计算技术应用于生物医学中人类生成的文本,是基础和应用生物医学研究中越来越重要的使能技术。同行评审文献的指数增长和与高通量技术相关的学科界限的打破增加了自动化工具的重要性,使科学家能够及时了解与其工作相关的所有已发表材料。然而,尽管经过数十年的研究,用于信息提取和文档检索等基本语言处理任务的最先进工具的性能仍然低于充分利用和广泛采用该技术所需的水平。文本挖掘系统的开发、性能和评估关键取决于适当语料库的可用性:已用与语言处理任务相关的人类判断注释的代表性文档的集合。语料库在这项技术的发展中扮演着两个角色:第一,它们作为“黄金标准”,可以公平地比较替代的自动化方法;第二,它们为统计和机器学习系统的训练提供数据,这些系统可以创建语言使用模式的经验模型。传统的观点认为语料库是兴趣领域的中性随机样本。我们的初步工作表明,在规模,质量,体裁和现有语料库的数量少的代表性模式的限制本身是一个关键的限制因素,在生物医学文本处理技术的近期突破。因此,我们建议测试以下假设:创建大型,高质量的,生物医学语料库从多个流派将导致显着改善的性能,生物医学文本挖掘系统和创建新的方法来文本挖掘任务。具体目标包括构建几个大型语料库,涵盖一系列体裁,并纳入丰富的知识表示;确定影响全文与摘要差异表现的因素;开发语言处理,特别是全文的新方法。由于从许多文本类型中自动提取信息的能力的提高将有助于科学家和临床医生完成跟上新兴生物医学文献的关键任务,因此潜在的公共卫生影响相当大。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
LAWRENCE E HUNTER其他文献
LAWRENCE E HUNTER的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('LAWRENCE E HUNTER', 18)}}的其他基金
Scientific Questions: A New Target for Biomedical NLP
科学问题:生物医学 NLP 的新目标
- 批准号:
10223438 - 财政年份:2020
- 资助金额:
$ 35.06万 - 项目类别:
Scientific Questions: A New Target for Biomedical NLP
科学问题:生物医学 NLP 的新目标
- 批准号:
10454968 - 财政年份:2020
- 资助金额:
$ 35.06万 - 项目类别:
Colorado Biomedical Informatics Training Program
科罗拉多州生物医学信息学培训计划
- 批准号:
9526127 - 财政年份:2017
- 资助金额:
$ 35.06万 - 项目类别:
Automated Literature Mining for Validation of High-Throughput Function Prediction
用于验证高通量函数预测的自动文献挖掘
- 批准号:
7843633 - 财政年份:2009
- 资助金额:
$ 35.06万 - 项目类别:
Construction of a Full Text Corpus for Biomedical Text Mining
生物医学文本挖掘全文语料库的构建
- 批准号:
7872692 - 财政年份:2009
- 资助金额:
$ 35.06万 - 项目类别:
Computational Bioscience Program Training Grant
计算生物科学计划培训补助金
- 批准号:
7824978 - 财政年份:2009
- 资助金额:
$ 35.06万 - 项目类别:
Computational Bioscience Program Training Grant
计算生物科学计划培训补助金
- 批准号:
7877947 - 财政年份:2007
- 资助金额:
$ 35.06万 - 项目类别:
Colorado Biomedical Informatics Training Program
科罗拉多州生物医学信息学培训计划
- 批准号:
8261523 - 财政年份:2007
- 资助金额:
$ 35.06万 - 项目类别:
相似海外基金
How Does Particle Material Properties Insoluble and Partially Soluble Affect Sensory Perception Of Fat based Products
不溶性和部分可溶的颗粒材料特性如何影响脂肪基产品的感官知觉
- 批准号:
BB/Z514391/1 - 财政年份:2024
- 资助金额:
$ 35.06万 - 项目类别:
Training Grant
BRC-BIO: Establishing Astrangia poculata as a study system to understand how multi-partner symbiotic interactions affect pathogen response in cnidarians
BRC-BIO:建立 Astrangia poculata 作为研究系统,以了解多伙伴共生相互作用如何影响刺胞动物的病原体反应
- 批准号:
2312555 - 财政年份:2024
- 资助金额:
$ 35.06万 - 项目类别:
Standard Grant
RII Track-4:NSF: From the Ground Up to the Air Above Coastal Dunes: How Groundwater and Evaporation Affect the Mechanism of Wind Erosion
RII Track-4:NSF:从地面到沿海沙丘上方的空气:地下水和蒸发如何影响风蚀机制
- 批准号:
2327346 - 财政年份:2024
- 资助金额:
$ 35.06万 - 项目类别:
Standard Grant
Graduating in Austerity: Do Welfare Cuts Affect the Career Path of University Students?
紧缩毕业:福利削减会影响大学生的职业道路吗?
- 批准号:
ES/Z502595/1 - 财政年份:2024
- 资助金额:
$ 35.06万 - 项目类别:
Fellowship
感性個人差指標 Affect-X の構築とビスポークAIサービスの基盤確立
建立个人敏感度指数 Affect-X 并为定制人工智能服务奠定基础
- 批准号:
23K24936 - 财政年份:2024
- 资助金额:
$ 35.06万 - 项目类别:
Grant-in-Aid for Scientific Research (B)
Insecure lives and the policy disconnect: How multiple insecurities affect Levelling Up and what joined-up policy can do to help
不安全的生活和政策脱节:多种不安全因素如何影响升级以及联合政策可以提供哪些帮助
- 批准号:
ES/Z000149/1 - 财政年份:2024
- 资助金额:
$ 35.06万 - 项目类别:
Research Grant
How does metal binding affect the function of proteins targeted by a devastating pathogen of cereal crops?
金属结合如何影响谷类作物毁灭性病原体靶向的蛋白质的功能?
- 批准号:
2901648 - 财政年份:2024
- 资助金额:
$ 35.06万 - 项目类别:
Studentship
Investigating how double-negative T cells affect anti-leukemic and GvHD-inducing activities of conventional T cells
研究双阴性 T 细胞如何影响传统 T 细胞的抗白血病和 GvHD 诱导活性
- 批准号:
488039 - 财政年份:2023
- 资助金额:
$ 35.06万 - 项目类别:
Operating Grants
New Tendencies of French Film Theory: Representation, Body, Affect
法国电影理论新动向:再现、身体、情感
- 批准号:
23K00129 - 财政年份:2023
- 资助金额:
$ 35.06万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
The Protruding Void: Mystical Affect in Samuel Beckett's Prose
突出的虚空:塞缪尔·贝克特散文中的神秘影响
- 批准号:
2883985 - 财政年份:2023
- 资助金额:
$ 35.06万 - 项目类别:
Studentship