Automated Biological Event Extraction from the Literature for Drug Discovery
从药物发现文献中自动提取生物事件
基本信息
- 批准号:BB/G013160/1
- 负责人:
- 金额:$ 36.76万
- 依托单位:
- 依托单位国家:英国
- 项目类别:Research Grant
- 财政年份:2009
- 资助国家:英国
- 起止时间:2009 至 无数据
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
The development of new drugs is both expensive and time-consuming: it can take over a decade for a new drug to be proven effective and safe, even with the many advances we have seen in the life sciences. From a batch of promising early candidates, only a few will eventually be approved. The longer a candidate lasts before being found unusable (attrition), the more expensive the cost, especially if clinical trials have been involved. Attrition rates run at ca 90%, and attrition is thus ruinously costly to the pharmaceutical industry, so there is an urgent need to reduce its impact. UK researchers, leading in biological and pharmaceutical research, would benefit greatly from means to identify as early as possible drug candidates that are likely to fail, preferably long before the clinical stage is reached. Another current area of concern is how drugs may be targeted to groups of individuals: not every individual responds in the same way to the same drug.. If we can discover which genes are implicated in this, then we can hope both to focus on the more promising drug candidates and find ways of tailoring treatments to (groups of) individuals. Unfortunately, however, scientists are faced with a severe knowledge gap: no scientist can keep up, using traditional means, with the vast amount of experimental data and especially its massive associated literature that is being (and has been )generated in the life sciences. Moreover, much knowledge is hidden in the literature: it has been shown that entirely new knowledge has been available for discovery in the literature, often for many years, but that the vastness of the literature has prevented researchers from achieving the required level of information retrieval, that is the first step in linking and synthesizing it into new, previously unsuspected knowledge. The main target of information finding is the MEDLINE resource, which currently contains some 17 million abstracts: this is seemingly large but is nevertheless a fraction of the information and hidden knowledge contained in the associated full text scientific articles. The proposed project is designed to help scientists overcome this knowledge gap, by developing automatic means to filter information and to synthesise new knowledge from the scientific literature. As a direct link between a (number of) proteins(s) and a physiological or pathophysiological process is not always described explicitly in a text, we must hunt for indirect evidence. This involves looking for indications of biological processes that are associated with proteins. When writing, biologists essentially describe 'events' such as such as phosphorylation that are involved in higher order bioprocesses such as angiogenesis. By identifying and extracting such events, and the particular biological entities (proteins, diseases), we can collect many fragments of information about bioprocesses from many thousands of texts. These fragments can then be used to find new knowledge by establishing associations among the fragments. To achieve such extraction of fragments for knowledge finding, powerful semantic text mining techniques are required that can handle the special languages of biologists, and that can achieve appropriate levels of abstraction far beyond mere word search. This project will customise the generic tools of the National Centre for Text Mining and carry out research to find the best ways of extracting events concerning biological processes from the literature. AstraZeneca will be closely involved, both in terms of informing the research, and providing practical domain expertise, requirements, data and concrete evaluation scenarios. Their interest is also manifest in a substantial cash contribution to the project. The result of this programme will be a text mining service to academic researchers, offered NaCTeM, supporting them in their task of discovering protein -bioprocess associations from the literature.
新药的开发既昂贵又耗时:即使我们在生命科学方面取得了许多进展,一种新药也可能需要十多年的时间才能被证明有效和安全。从一批有前途的早期候选人中,只有几个人最终会获得批准。候选人在被发现不可用(磨损)之前持续的时间越长,成本就越高,特别是在涉及临床试验的情况下。自然流失率约为90%,因此对制药业来说,自然减员的成本是毁灭性的,因此迫切需要减少其影响。在生物和制药研究领域处于领先地位的英国研究人员,如果能够及早识别可能失败的候选药物,将大大受益,最好是在进入临床阶段之前很久。目前另一个令人担忧的领域是药物如何针对个体群体:不是每个人对同一种药物的反应都是相同的。如果我们能发现哪些基因与此有关,那么我们可以希望既专注于更有前途的候选药物,又找到针对(群体)个体的量身定制治疗方法。然而,不幸的是,科学家面临着严重的知识鸿沟:没有一位科学家能够用传统方法跟上生命科学正在(和已经)产生的大量实验数据,特别是其中的大量相关文献。此外,许多知识隐藏在文献中:已经表明,许多年来,文献中一直有全新的知识可供发现,但文献的浩瀚使研究人员无法达到所需的信息检索水平,这是将其联系和综合成以前未被怀疑的新知识的第一步。信息查找的主要目标是MEDLINE资源,该资源目前包含约1700万篇摘要:这似乎很大,但与相关全文科学文章中包含的信息和隐藏的知识相比,这只是一小部分。这个拟议的项目旨在帮助科学家克服这一知识鸿沟,方法是开发自动过滤信息的方法,并从科学文献中合成新知识。由于(许多)蛋白质(S)和生理或病理生理过程之间的直接联系并不总是在文本中明确描述,我们必须寻找间接证据。这包括寻找与蛋白质相关的生物过程的迹象。在写作时,生物学家基本上描述了参与更高级生物过程(如血管生成)的“事件”,如磷酸化。通过识别和提取这些事件以及特定的生物实体(蛋白质、疾病),我们可以从成千上万的文本中收集许多关于生物过程的信息片段。然后,通过在片段之间建立关联,可以使用这些片段来寻找新的知识。为了实现这种用于知识发现的片段提取,需要强大的语义文本挖掘技术来处理生物学家的特殊语言,并且能够实现远远超出仅仅单词搜索的适当级别的抽象。该项目将定制国家文本挖掘中心的通用工具,并进行研究,以找到从文献中提取与生物过程有关的事件的最佳方法。阿斯利康将密切参与,包括为研究提供信息,以及提供实用的领域专业知识、要求、数据和具体的评估方案。他们的兴趣还体现在对该项目的大量现金捐助上。NACTeM表示,该计划的结果将是为学术研究人员提供文本挖掘服务,支持他们从文献中发现蛋白质-生物过程关联的任务。
项目成果
期刊论文数量(10)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Named entity recognition for bacterial Type IV secretion systems.
- DOI:10.1371/journal.pone.0014780
- 发表时间:2011-03-29
- 期刊:
- 影响因子:3.7
- 作者:Ananiadou S;Sullivan D;Black W;Levow GA;Gillespie JJ;Mao C;Pyysalo S;Kolluru B;Tsujii J;Sobral B
- 通讯作者:Sobral B
Adding text mining workflows as web services to the BioCatalogue
将文本挖掘工作流程作为 Web 服务添加到 BioCatalogue
- DOI:10.1145/2166896.2166913
- 发表时间:2011
- 期刊:
- 影响因子:0
- 作者:Kontonasios G
- 通讯作者:Kontonasios G
BioCause: Annotating and analysing causality in the biomedical domain.
- DOI:10.1186/1471-2105-14-2
- 发表时间:2013-01-16
- 期刊:
- 影响因子:3
- 作者:Mihăilă C;Ohta T;Pyysalo S;Ananiadou S
- 通讯作者:Ananiadou S
Constructing High-Fidelity Phenotype Knowledge Graphs for Infectious Diseases With a Fine-Grained Semantic Information Model: Development and Usability Study.
- DOI:10.2196/26892
- 发表时间:2021-06-15
- 期刊:
- 影响因子:7.4
- 作者:Deng L;Chen L;Yang T;Liu M;Li S;Jiang T
- 通讯作者:Jiang T
The Protein-Protein Interaction tasks of BioCreative III: classification/ranking of articles and linking bio-ontology concepts to full text.
- DOI:10.1186/1471-2105-12-s8-s3
- 发表时间:2011-10-03
- 期刊:
- 影响因子:3
- 作者:Krallinger M;Vazquez M;Leitner F;Salgado D;Chatr-Aryamontri A;Winter A;Perfetto L;Briganti L;Licata L;Iannuccelli M;Castagnoli L;Cesareni G;Tyers M;Schneider G;Rinaldi F;Leaman R;Gonzalez G;Matos S;Kim S;Wilbur WJ;Rocha L;Shatkay H;Tendulkar AV;Agarwal S;Liu F;Wang X;Rak R;Noto K;Elkan C;Lu Z;Dogan RI;Fontaine JF;Andrade-Navarro MA;Valencia A
- 通讯作者:Valencia A
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Sophia Ananiadou其他文献
化学安全学習における周辺情報の提示に関する検討
化学品安全学习中外围信息呈现的研究
- DOI:
- 发表时间:
2010 - 期刊:
- 影响因子:0
- 作者:
Kano;Yoshinobu;Ruben Dorado;Luke McCrohon;Sophia Ananiadou;Jun'ichi Tsujii;江木啓訓,松澤沙緒里,宗官祥史,品川徳秀,藤波香織 - 通讯作者:
江木啓訓,松澤沙緒里,宗官祥史,品川徳秀,藤波香織
"Integrated NLP Evaluation System for Pluggable Evaluation Metrics with Extensive Interoperable Toolkit (査読有)"
“用于可插入评估指标的集成 NLP 评估系统,具有广泛的可互操作工具包(同行评审)”
- DOI:
- 发表时间:
2009 - 期刊:
- 影响因子:0
- 作者:
Yoshinobu Kano;Luke McCrohon;Sophia Ananiadou;and Jun'ichi Tsujii - 通讯作者:
and Jun'ichi Tsujii
Integrated NLP Evaluation System for Pluggable Evaluation Metrics with Extensive Interoperable Toolkit (査読有)
用于可插入评估指标的集成 NLP 评估系统,具有广泛的可互操作工具包(同行评审)
- DOI:
- 发表时间:
2009 - 期刊:
- 影响因子:0
- 作者:
Kano;Yoshinobu;Luke McCrohon;Sophia Ananiadou;Jun'ichi Tsujii - 通讯作者:
Jun'ichi Tsujii
Analyzing Human Behaviors in an Interactive Art Installation
分析互动艺术装置中的人类行为
- DOI:
- 发表时间:
2009 - 期刊:
- 影响因子:0
- 作者:
Kano;Yoshinobu;Paul Dobson;Mio Nakanishi;Jun'ichi Tsujii;Sophia Ananiadou;Takashi Kiriyama - 通讯作者:
Takashi Kiriyama
Emotion detection for misinformation: A review
虚假信息的情绪检测:综述
- DOI:
10.1016/j.inffus.2024.102300 - 发表时间:
2024-07-01 - 期刊:
- 影响因子:15.500
- 作者:
Zhiwei Liu;Tianlin Zhang;Kailai Yang;Paul Thompson;Zeping Yu;Sophia Ananiadou - 通讯作者:
Sophia Ananiadou
Sophia Ananiadou的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Sophia Ananiadou', 18)}}的其他基金
Japan Partnering Award. Text mining and bioinformatics platforms for metabolic pathway modelling.
日本合作伙伴奖。
- 批准号:
BB/P025684/1 - 财政年份:2017
- 资助金额:
$ 36.76万 - 项目类别:
Research Grant
Enriching Metabolic PATHwaY models with evidence from the literature (EMPATHY)
利用文献证据丰富代谢路径模型 (EMPATHY)
- 批准号:
BB/M006891/1 - 财政年份:2015
- 资助金额:
$ 36.76万 - 项目类别:
Research Grant
Supporting Evidence-based Public Health Interventions using Text Mining
使用文本挖掘支持循证公共卫生干预措施
- 批准号:
MR/L01078X/1 - 财政年份:2014
- 资助金额:
$ 36.76万 - 项目类别:
Research Grant
From text to pathways: text mining techniques for reconstructing signalling pathways
从文本到通路:用于重建信号通路的文本挖掘技术
- 批准号:
BB/G53025X/1 - 财政年份:2009
- 资助金额:
$ 36.76万 - 项目类别:
Research Grant
Tools for the text mining-based visualisation of the provenance of biochemical networks
基于文本挖掘的生化网络起源可视化工具
- 批准号:
BB/E004431/1 - 财政年份:2007
- 资助金额:
$ 36.76万 - 项目类别:
Research Grant
相似海外基金
NSF/BIO-DFG: Biological Fe-S intermediates in the synthesis of nitrogenase metalloclusters
NSF/BIO-DFG:固氮酶金属簇合成中的生物 Fe-S 中间体
- 批准号:
2335999 - 财政年份:2024
- 资助金额:
$ 36.76万 - 项目类别:
Standard Grant
Collaborative Research: Conference: Large Language Models for Biological Discoveries (LLMs4Bio)
合作研究:会议:生物发现的大型语言模型 (LLMs4Bio)
- 批准号:
2411529 - 财政年份:2024
- 资助金额:
$ 36.76万 - 项目类别:
Standard Grant
Collaborative Research: Conference: Large Language Models for Biological Discoveries (LLMs4Bio)
合作研究:会议:生物发现的大型语言模型 (LLMs4Bio)
- 批准号:
2411530 - 财政年份:2024
- 资助金额:
$ 36.76万 - 项目类别:
Standard Grant
Collaborative Research: NSF-ANR MCB/PHY: Probing Heterogeneity of Biological Systems by Force Spectroscopy
合作研究:NSF-ANR MCB/PHY:通过力谱探测生物系统的异质性
- 批准号:
2412551 - 财政年份:2024
- 资助金额:
$ 36.76万 - 项目类别:
Standard Grant
Elucidating mechanisms of biological hydrogen conversion through model metalloenzymes
通过模型金属酶阐明生物氢转化机制
- 批准号:
2419343 - 财政年份:2024
- 资助金额:
$ 36.76万 - 项目类别:
Standard Grant
Collaborative Research: The Interplay of Water Condensation and Fungal Growth on Biological Surfaces
合作研究:水凝结与生物表面真菌生长的相互作用
- 批准号:
2401507 - 财政年份:2024
- 资助金额:
$ 36.76万 - 项目类别:
Standard Grant
DESIGN: Driving Culture Change in a Federation of Biological Societies via Cohort-Based Early-Career Leaders
设计:通过基于队列的早期职业领袖推动生物协会联盟的文化变革
- 批准号:
2334679 - 财政年份:2024
- 资助金额:
$ 36.76万 - 项目类别:
Standard Grant
REU Site: Modeling the Dynamics of Biological Systems
REU 网站:生物系统动力学建模
- 批准号:
2243955 - 财政年份:2024
- 资助金额:
$ 36.76万 - 项目类别:
Standard Grant
Defining the biological boundaries to sustain extant life on Mars
定义维持火星现存生命的生物边界
- 批准号:
DP240102658 - 财政年份:2024
- 资助金额:
$ 36.76万 - 项目类别:
Discovery Projects
Advanced Multiscale Biological Imaging using European Infrastructures
利用欧洲基础设施进行先进的多尺度生物成像
- 批准号:
EP/Y036654/1 - 财政年份:2024
- 资助金额:
$ 36.76万 - 项目类别:
Research Grant














{{item.name}}会员




