Scaling the next generation of protein sequence searches to enable rapid discovery of novel actives
扩展下一代蛋白质序列搜索,以实现新型活性物质的快速发现
基本信息
- 批准号:2290636
- 负责人:
- 金额:--
- 依托单位:
- 依托单位国家:英国
- 项目类别:Studentship
- 财政年份:2019
- 资助国家:英国
- 起止时间:2019 至 无数据
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Metagenomics investigates the collective genetic material from microorganisms within a specific environment. The advent of modern sequencing technologies has enabled sufficiently deep sequencing of microbial communities to recover large contiguous sequences (contigs). Using in silico approaches, contigs can be binned into sets originating from common species, which can yield high-quality metagenome assembled genomes (MAGs). The EMBL-EBI team behind MGnify are assembling metagenomes at scale, identifying >5,000 novel MAGs, alongside millions of contigs encoding billions of proteins. Removing redundancy, this growing metagenomics database (MGDB) already comprises >850m protein sequences (grouping into ~280m clusters) with <1% of sequences in common with UniProtKB.Working with the SME BioCatalysts, a significantly smaller version of this database was mined for commercial benefit, identifying novel enzymes for the food industry. MGDB represents an invaluable opportunity for Unilever in the search for anti-microbial (AMP) actives (e.g. preservation) or host-targeting effectors. However, there exist major technical challenges in providing interactive searches, presenting results and linking metadata to make informed target selections.Heuristics and in-memory solutions have helped the HMMER webserver achieve interactive search speeds, however the MGDB is over 200GB and is not tractable for in-memory solutions. Furthermore, there is a data presentation issue, as searches against the MGDB can result in 10s of thousands of matches. Identification of the most relevant query result, which may not be the top hit, requires development of complex search infrastructure with multiple facets for filtering.Objectives: We will develop technology to expose and interrogate the MGDB through EMBL-EBI and use it to identify new actives for BPC in real time. This will extend beyond sequence similarity searches by (i) linking searches to find multiple genes pertaining to an operon or gene cluster; (ii) enabling filtering of search results based on original sample metadata; (iii) enabling retrieval of source contigs for further analysis; (iv) technical innovation to provide real-time search speeds. This tool-set will be applied in two Unilever case studies. The search infrastructure will be delivered to scientists via the TRON/BD4BS Bio-platform, to utilise the MGnify API and MGDB alongside internal data and analysis tools.Strategic outcomes:- As the only existing resource of its kind, Unilever can exploit the entire MGDB at an early stage.- MGDB includes eukaryotes, which represent a vast untapped source of novel sequences.- Unilever driven use-cases will ensure that the search infrastructure meets FMCG business needs.- Public datasets of Unilever interest prioritised for representation in the MGnfiy MGDB.- Web services will expose data types of interest to Unilever.- EMBL-EBI deployment will be available for all researchers.- Portability & scalability for academia and industry, allowing search of similar in-house datasets.- Creation of novel solutions for other resources, such as UniProtKB, to provide searches as their data volumes expand.
宏基因组学研究特定环境中微生物的集体遗传物质。现代测序技术的出现使得能够对微生物群落进行足够深度的测序,以恢复大的连续序列(重叠群)。使用计算机模拟方法,重叠群可以被分箱到源自常见物种的集合中,这可以产生高质量的宏基因组组装的基因组(MAG)。MGnify背后的EMBL-EBI团队正在大规模组装宏基因组,识别出超过5,000个新的MAG,以及数百万个编码数十亿蛋白质的重叠群。去除冗余,这个不断增长的宏基因组学数据库(MGDB)已经包含超过8.5亿个蛋白质序列(分组为约2.8亿个簇),其中<1%的序列与UniProtKB相同。与SME BioCatalysts合作,该数据库的一个显着较小的版本被挖掘用于商业利益,为食品工业识别新型酶。MGDB为联合利华寻找抗微生物(AMP)活性物质(例如防腐剂)或宿主靶向效应物提供了宝贵的机会。然而,在提供交互式搜索、呈现结果和链接元数据以做出明智的目标选择方面存在重大的技术挑战。启发式和内存解决方案帮助HMMER Web服务器实现了交互式搜索速度,但是MGDB超过200 GB,并且不适合内存解决方案。此外,还有一个数据呈现问题,因为对MGDB的搜索可能会导致成千上万的匹配。识别最相关的查询结果,这可能不是最热门的,需要开发复杂的搜索基础设施与多方面的filtering.Objectives:我们将开发技术,通过EMBL-EBI暴露和询问MGDB,并使用它来识别新的活动,为BPC在真实的时间。这将通过以下方式扩展到序列相似性搜索之外:(i)链接搜索以找到与操纵子或基因簇有关的多个基因;(ii)能够基于原始样本元数据过滤搜索结果;(iii)能够检索源重叠群以进行进一步分析;(iv)技术创新以提供实时搜索速度。这套工具将应用于联合利华的两个案例研究。搜索基础设施将通过TRON/BD4BS Bio-platform交付给科学家,以利用MGnify API和MGDB以及内部数据和分析工具。战略成果:-作为同类产品中唯一的现有资源,联合利华可以在早期阶段利用整个MGDB。- MGDB包括真核生物,它们代表了新序列的巨大未开发来源。联合利华驱动的用例将确保搜索基础设施满足快速消费品业务需求。联合利华感兴趣的公共数据集优先在MGnfiy MGDB中表示。Web服务将公开联合利华感兴趣的数据类型。EMBL-EBI部署将适用于所有研究人员。学术界和工业界的可移植性和可扩展性,允许搜索类似的内部数据集。为其他资源(例如UniProtKB)创建新颖的解决方案,以便随着数据量的扩展而提供搜索。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
其他文献
吉治仁志 他: "トランスジェニックマウスによるTIMP-1の線維化促進機序"最新医学. 55. 1781-1787 (2000)
Hitoshi Yoshiji 等:“转基因小鼠中 TIMP-1 的促纤维化机制”现代医学 55. 1781-1787 (2000)。
- DOI:
- 发表时间:
- 期刊:
- 影响因子:0
- 作者:
- 通讯作者:
LiDAR Implementations for Autonomous Vehicle Applications
- DOI:
- 发表时间:
2021 - 期刊:
- 影响因子:0
- 作者:
- 通讯作者:
吉治仁志 他: "イラスト医学&サイエンスシリーズ血管の分子医学"羊土社(渋谷正史編). 125 (2000)
Hitoshi Yoshiji 等人:“血管医学与科学系列分子医学图解”Yodosha(涉谷正志编辑)125(2000)。
- DOI:
- 发表时间:
- 期刊:
- 影响因子:0
- 作者:
- 通讯作者:
Effect of manidipine hydrochloride,a calcium antagonist,on isoproterenol-induced left ventricular hypertrophy: "Yoshiyama,M.,Takeuchi,K.,Kim,S.,Hanatani,A.,Omura,T.,Toda,I.,Akioka,K.,Teragaki,M.,Iwao,H.and Yoshikawa,J." Jpn Circ J. 62(1). 47-52 (1998)
钙拮抗剂盐酸马尼地平对异丙肾上腺素引起的左心室肥厚的影响:“Yoshiyama,M.,Takeuchi,K.,Kim,S.,Hanatani,A.,Omura,T.,Toda,I.,Akioka,
- DOI:
- 发表时间:
- 期刊:
- 影响因子:0
- 作者:
- 通讯作者:
的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('', 18)}}的其他基金
An implantable biosensor microsystem for real-time measurement of circulating biomarkers
用于实时测量循环生物标志物的植入式生物传感器微系统
- 批准号:
2901954 - 财政年份:2028
- 资助金额:
-- - 项目类别:
Studentship
Exploiting the polysaccharide breakdown capacity of the human gut microbiome to develop environmentally sustainable dishwashing solutions
利用人类肠道微生物群的多糖分解能力来开发环境可持续的洗碗解决方案
- 批准号:
2896097 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
A Robot that Swims Through Granular Materials
可以在颗粒材料中游动的机器人
- 批准号:
2780268 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
Likelihood and impact of severe space weather events on the resilience of nuclear power and safeguards monitoring.
严重空间天气事件对核电和保障监督的恢复力的可能性和影响。
- 批准号:
2908918 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
Proton, alpha and gamma irradiation assisted stress corrosion cracking: understanding the fuel-stainless steel interface
质子、α 和 γ 辐照辅助应力腐蚀开裂:了解燃料-不锈钢界面
- 批准号:
2908693 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
Field Assisted Sintering of Nuclear Fuel Simulants
核燃料模拟物的现场辅助烧结
- 批准号:
2908917 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
Assessment of new fatigue capable titanium alloys for aerospace applications
评估用于航空航天应用的新型抗疲劳钛合金
- 批准号:
2879438 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
Developing a 3D printed skin model using a Dextran - Collagen hydrogel to analyse the cellular and epigenetic effects of interleukin-17 inhibitors in
使用右旋糖酐-胶原蛋白水凝胶开发 3D 打印皮肤模型,以分析白细胞介素 17 抑制剂的细胞和表观遗传效应
- 批准号:
2890513 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
Understanding the interplay between the gut microbiome, behavior and urbanisation in wild birds
了解野生鸟类肠道微生物组、行为和城市化之间的相互作用
- 批准号:
2876993 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
相似国自然基金
Next Generation Majorana Nanowire Hybrids
- 批准号:
- 批准年份:2020
- 资助金额:20 万元
- 项目类别:
相似海外基金
Adapting and Scaling the Biotinkering Approach through a CoP Model
通过 CoP 模型调整和扩展生物修复方法
- 批准号:
10666229 - 财政年份:2023
- 资助金额:
-- - 项目类别:
Scaling up computational genomics with tree sequences
用树序列扩展计算基因组学
- 批准号:
10585745 - 财政年份:2023
- 资助金额:
-- - 项目类别:
Scaling up the Next-Generation Communication Systems: from physically-consistent modelling to low complexity DSP algorithms to hardware implementation
扩展下一代通信系统:从物理一致的建模到低复杂度 DSP 算法再到硬件实现
- 批准号:
570045-2022 - 财政年份:2022
- 资助金额:
-- - 项目类别:
Alexander Graham Bell Canada Graduate Scholarships - Doctoral
Scaling up computational genomics with tree sequences
用树序列扩展计算基因组学
- 批准号:
10471496 - 财政年份:2021
- 资助金额:
-- - 项目类别:
Modeling Product Selectivity in Electrocatalytic Carbon Dioxide Reduction Using Scaling Relationships
使用比例关系对电催化二氧化碳还原中的产物选择性进行建模
- 批准号:
10312421 - 财政年份:2021
- 资助金额:
-- - 项目类别:
Modeling Product Selectivity in Electrocatalytic Carbon Dioxide Reduction Using Scaling Relationships
使用比例关系对电催化二氧化碳还原中的产物选择性进行建模
- 批准号:
10462533 - 财政年份:2021
- 资助金额:
-- - 项目类别:
Scaling Volumetric Imaging, Analysis and Science Communication Using Immersive Virtual Reality
使用沉浸式虚拟现实扩展体积成像、分析和科学传播
- 批准号:
10604786 - 财政年份:2020
- 资助金额:
-- - 项目类别:
Scaling the next generation of protein sequence searches to enable rapid discovery of novel actives
扩展下一代蛋白质序列搜索,以实现新型活性物质的快速发现
- 批准号:
BB/T508391/1 - 财政年份:2019
- 资助金额:
-- - 项目类别:
Training Grant
Scaling chip-to-chip interfaces for next-generation communication equipment
扩展下一代通信设备的芯片间接口
- 批准号:
505827-2016 - 财政年份:2018
- 资助金额:
-- - 项目类别:
Collaborative Research and Development Grants
Scaling chip-to-chip interfaces for next-generation communication equipment
扩展下一代通信设备的芯片间接口
- 批准号:
505827-2016 - 财政年份:2017
- 资助金额:
-- - 项目类别:
Collaborative Research and Development Grants














{{item.name}}会员




