Large Databases of Small Molecules - Drug Development Tool and Public Resource
小分子大型数据库 - 药物开发工具和公共资源
基本信息
- 批准号:10926595
- 负责人:
- 金额:$ 13.85万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:
- 资助国家:美国
- 起止时间:至
- 项目状态:未结题
- 来源:
- 关键词:3-DimensionalAlgorithmsAnniversaryAreaAwarenessBiologicalBiological AssayBooksCactaceaeCatalogsCharacteristicsChemical StructureChemicalsCollectionComputer AssistedComputersContractsCustomDataData SetDatabasesDepositionDevelopmental Therapeutics ProgramDrug DesignEvaluationGenerationsGoalsInformaticsInformation SciencesInternetLegal patentLinkMalignant NeoplasmsMethodsNaturePaperPharmacologic SubstancePropertyPubChemPublicationsReadabilityRecordsResearch PersonnelResourcesRunningSamplingSeriesServicesStructureSystemTelephoneTimeUnited States National Institutes of HealthUpdateVendorWorkWritingchemical groupchemical synthesiscloud platformdatabase structuredesigndrug developmentimprovedinformatics toolinsightnext generationpharmacophoreprogramsscreeningsmall moleculetautomertooltool developmentweb based interfaceweb platformweb serverweb servicesweb siteweb-based tool
项目摘要
The principal objective of this project is to make large collections of small molecules available for aiding in drug development, both in-house and publicly, to advance the fields of chemical structure identification and processing and of unique compound identifier generation, as well as to provide free chemoinformatics tools aiding one in dealing with such databases. This project started with posting the information in the Open NCI Database on the CADD Group's public web server. Many databases are available to the user, including large vendor catalogs of compounds that can be acquired for screening. Advanced processing is applied to the data, and powerful searching and display capabilities have been implemented. The nature of the resources currently being developed is exemplified by a brief description of this service: The data in this current Enhanced NCI Web Browser web service comprise data from NCI's Developmental Therapeutics Program (DTP) and additional information with which we have augmented the DTP data sets. We have subjected the Open NCI Database of about 260,000 compounds to various analyses that help to better understand its characteristics and put it in perspective of other large databases used in computer-aided drug design and chemical information sciences. Various clustering methods have been applied to it to elucidate its diversity, and the results have been compared with those for other databases. The Open NCI Database has been converted into various formats, suitable for further processing including 3D pharmacophore searching. We have also implemented a powerful public search tool for the Open NCI Database with a web interface based on the chemical information toolkit CACTVS. Using just a web browser, the user is able to search about 250,000 structures for more than 600 criteria. We have greatly augmented the original DTP files with numerous additional data fields, be it calculated, predicted or hyperlinked information. These data have also been made available in directly downloadable format. Links to several additional services for further processing have been implemented. An online 3D pharmacophore capability has been built, a capability that is currently unique on the web, as far as we are aware of. Searchable predictions of more than 550 different biological activities, calculated by the program PASS for most of the quarter-million compounds, have been included in the web service (abstract). A more recent service is our Chemical Structure Lookup Service (CSLS), available at http://cactus.nci.nih.gov/lookup. CSLS is essentially a "phone book" for small molecules, allowing the user to quickly find out in which, if any, of over 100 different databases (both public and commercial), comprising more than 74 million entries, their compounds occur. Updates of both the user interface and the structure and data holdings are underway as of the time of this writing, which will push the number of entries in CSLS beyond the 100 million mark. Part of these projects is the downloading, reformatting and evaluation for cancer-related purposes, of the massive set of structure and assay data as deposited in PubChem. The Chemical Identifier Resolver (CIR) is the service with the most use, with typically several hundred thousand requests per day. CIR works as a resolver for different chemical structure identifiers and allows one to convert a given structure identifier into another representation or structure identifier. Among others, our NCI/CADD Structure Identifiers developed in-house as well as the new Standard InChI and InChIKey identifiers are handled by this service. One of CIR's key features is that it is a programmatic interface into the Chemical Structure Database (CSDB). An update of CSDB has been completed to over 360 million original database records representing approximately 128 unique million small-molecule structures. Many additional capabilities are planned to be added to this service, which is increasingly being integrated with other web services and chemoinformatics tools world-wide. CIR will also become increasingly important in the area of publications involving chemical structures, as efforts increase to make inclusion of computer-readable representations of all compounds presented in a paper mandatory. We are working on the next generation web platform which will be the basis for a series of new web services and updates of existing services including CADD Group's Chemical Structure Lookup Service (CSLS II). The URL of our public web server is https://cactus.nci.nih.gov. The monthly average usage counts on cactus from January 2016 through December 2021 have been 14 million accesses, i.e. more than 450,000 per day. We have analyzed a set of 43 million chemical structure records extracted from patent data (EP, US PTO, WO) by the IBM-led consortium of large pharmaceutical companies in the context of the SIIP (Strategic IP Insight Platform) project. The originally CADD Group-developed utility OSRA was used in this project. Part of these data were given for public use to both PubChem and the CADD Group (see, e.g., http://www-935.ibm.com/services/us/gbs/bao/siip/nih/?sid=0015AFBF08D8F183C1F8E32A430CFFEB). Efforts to implement a resource for making affordable chemical synthesis of screening samples available to all NIH researchers were realized in the form of an extension of the contract with the formerly independent company ChemNavigator, now part of Sigma-Aldrich, in turn acquired by Merck GmbH, who have implemented the so-called Semi-Custom Synthesis Online Request System (SCSORS). Our database and chemoinformatics tools are benefiting from the work pertaining to tautomerism, in particular related to the redesign of the handling of tautomerism for version 2 of the IUPAC InChI identifier. These efforts include our downlaodable Tautomer Database. A recent new web tool in this context is the so-called Tautomerizer. Numerous additional downloadable data sets have been made available on the group's web server. The work of creating a database of more than a billion easily synthesizable compounds in the SAVI project is described elsewhere. Efforts to move some of these tools to cloud platforms are being undertaken. The cactus web server has celebrated its 25th anniversary. It is the longest-running freely accessible chemoinformatics website with advanced structure search capabilities in the world. A very significant update of the several of the services on our web server is currently underway.
该项目的主要目标是收集大量的小分子,以帮助内部和公众进行药物开发,推进化学结构鉴定和处理以及唯一化合物标识符生成领域的发展,并提供免费的化学信息学工具,帮助人们处理这些数据库。该项目首先在CADD集团的公共web服务器上的开放NCI数据库中发布信息。用户可以使用许多数据库,包括可以获得用于筛选的化合物的大型供应商目录。对数据进行了高级处理,实现了强大的搜索和显示功能。目前正在开发的资源的性质可以通过该服务的简要描述来举例说明:当前增强的NCI Web Browser Web服务中的数据包括来自NCI发展治疗计划(DTP)的数据以及我们增强了DTP数据集的附加信息。我们对大约26万种化合物的NCI开放数据库进行了各种分析,以帮助更好地了解其特征,并将其与计算机辅助药物设计和化学信息科学中使用的其他大型数据库进行比较。本文采用了不同的聚类方法来阐明其多样性,并将聚类结果与其他数据库的聚类结果进行了比较。开放NCI数据库已转换成各种格式,适合进一步处理,包括3D药效团搜索。我们还为Open NCI数据库实现了一个强大的公共搜索工具,该工具基于化学信息工具包CACTVS提供了一个web界面。仅使用一个网络浏览器,用户就可以根据600多个标准搜索大约25万个结构。我们用许多额外的数据字段极大地增强了原始DTP文件,无论是计算的、预测的还是超链接的信息。这些数据也以可直接下载的格式提供。已经实现了几个附加服务的链接,以便进行进一步处理。一个在线3D药效团功能已经建立起来,据我们所知,这个功能目前在网络上是独一无二的。通过PASS程序计算出的超过550种不同生物活性的可搜索预测,已经包含在web服务中(摘要)。最近的一项服务是我们的化学结构查找服务(CSLS),可在http://cactus.nci.nih.gov/lookup获得。CSLS本质上是小分子的“电话簿”,允许用户快速找到,如果有的话,超过100个不同的数据库(包括公共和商业),包含超过7400万个条目,它们的化合物出现在哪里。在撰写本文时,用户界面、结构和数据持有的更新正在进行中,这将使CSLS中的条目数量超过1亿大关。这些项目的一部分是下载,重新格式化和评估癌症相关的目的,存储在PubChem中的大量结构和分析数据集。化学标识解析器(Chemical Identifier Resolver, CIR)是使用最多的服务,通常每天有数十万个请求。CIR作为不同化学结构标识符的解析器,允许将给定的结构标识符转换为另一种表示或结构标识符。其中,我们内部开发的NCI/CADD结构标识符以及新的标准InChI和InChIKey标识符都由该服务处理。CIR的主要特点之一是它是一个进入化学结构数据库(CSDB)的编程接口。CSDB已经完成了超过3.6亿条原始数据库记录的更新,这些记录代表了大约1.28亿个独特的小分子结构。许多额外的功能计划被添加到这个服务中,它正越来越多地与世界范围内的其他网络服务和化学信息学工具集成。CIR在涉及化学结构的出版物领域也将变得越来越重要,因为越来越多的努力要求在一篇论文中包含所有化合物的计算机可读表示。我们正在开发下一代网络平台,这将是一系列新网络服务和现有服务更新的基础,包括CADD集团的化学结构查找服务(CSLS II)。我们的公共web服务器的URL是https://cactus.nci.nih.gov。从2016年1月到2021年12月,仙人掌的月平均使用量为1400万次,即每天超过45万次。我们分析了由ibm领导的大型制药公司联盟在SIIP(战略知识产权洞察平台)项目背景下从专利数据(EP, US PTO, WO)中提取的4300万条化学结构记录。本项目使用了最初CADD group开发的实用程序OSRA。这些数据的一部分被公开提供给PubChem和CADD Group使用(例如,参见http://www-935.ibm.com/services/us/gbs/bao/siip/nih/?sid=0015AFBF08D8F183C1F8E32A430CFFEB)。努力实施一种资源,使所有NIH研究人员都能负担得起筛选样品的化学合成,这是以与前独立公司ChemNavigator(现在是Sigma-Aldrich的一部分)的合同延长的形式实现的,该公司随后被默克公司收购,后者实施了所谓的半定制合成在线请求系统(SCSORS)。我们的数据库和化学信息学工具正受益于与互变异构相关的工作,特别是与IUPAC InChI标识符第2版互变异构处理的重新设计有关。这些努力包括我们可下载的互变器数据库。在这种背景下,最近出现了一种新的网络工具,即所谓的互变器。在该组织的网络服务器上提供了许多额外的可下载数据集。在SAVI项目中创建超过10亿种容易合成化合物的数据库的工作在其他地方进行了描述。正在努力将其中一些工具转移到云平台上。仙人掌网络服务器已经庆祝了它的25周年纪念日。它是世界上运行时间最长的免费访问的化学信息学网站,具有先进的结构搜索功能。我们的web服务器上的几个服务正在进行一个非常重要的更新。
项目成果
期刊论文数量(12)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Optical structure recognition software to recover chemical information: OSRA, an open source solution.
- DOI:10.1021/ci800067r
- 发表时间:2009-03
- 期刊:
- 影响因子:5.6
- 作者:Filippov, Igor V.;Nicklaus, Marc C.
- 通讯作者:Nicklaus, Marc C.
Tautomerism of Warfarin: Combined Chemoinformatics, Quantum Chemical, and NMR Investigation.
- DOI:10.1021/acs.joc.5b01370
- 发表时间:2015-10-16
- 期刊:
- 影响因子:0
- 作者:Guasch L;Peach ML;Nicklaus MC
- 通讯作者:Nicklaus MC
Open Data, Open Source and Open Standards in chemistry: The Blue Obelisk five years on.
- DOI:10.1186/1758-2946-3-37
- 发表时间:2011-10-14
- 期刊:
- 影响因子:8.6
- 作者:O'Boyle NM;Guha R;Willighagen EL;Adams SE;Alvarsson J;Bradley JC;Filippov IV;Hanson RM;Hanwell MD;Hutchison GR;James CA;Jeliazkova N;Lang AS;Langner KM;Lonie DC;Lowe DM;Pansanel J;Pavlov D;Spjuth O;Steinbeck C;Tenderholt AL;Theisen KJ;Murray-Rust P
- 通讯作者:Murray-Rust P
Computer tools in the discovery of HIV-1 integrase inhibitors.
- DOI:10.4155/fmc.10.193
- 发表时间:2010-07
- 期刊:
- 影响因子:4.2
- 作者:Liao C;Nicklaus MC
- 通讯作者:Nicklaus MC
A new approach to radial basis function approximation and its application to QSAR.
径向基函数近似及其应用于QSAR的新方法。
- DOI:10.1021/ci400704f
- 发表时间:2014-03-24
- 期刊:
- 影响因子:5.6
- 作者:Zakharov AV;Peach ML;Sitzmann M;Nicklaus MC
- 通讯作者:Nicklaus MC
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
MARC NICKLAUS其他文献
MARC NICKLAUS的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('MARC NICKLAUS', 18)}}的其他基金
HIV Integrase Modeling and Computer-Aided Inhibitor Deve
HIV整合酶建模和计算机辅助抑制剂开发
- 批准号:
7291875 - 财政年份:
- 资助金额:
$ 13.85万 - 项目类别:
HIV Integrase Modeling and Computer-Aided Inhibitor Development
HIV 整合酶建模和计算机辅助抑制剂开发
- 批准号:
7965392 - 财政年份:
- 资助金额:
$ 13.85万 - 项目类别:
HIV Integrase Modeling and Computer-Aided Inhibitor and Microbicide Development
HIV 整合酶建模以及计算机辅助抑制剂和杀菌剂开发
- 批准号:
10702372 - 财政年份:
- 资助金额:
$ 13.85万 - 项目类别:
HIV Integrase Modeling and Computer-Aided Inhibitor Development
HIV 整合酶建模和计算机辅助抑制剂开发
- 批准号:
7733068 - 财政年份:
- 资助金额:
$ 13.85万 - 项目类别:
Synthetically Accessible Virtual Inventory (SAVI)
可综合访问的虚拟库存 (SAVI)
- 批准号:
10926263 - 财政年份:
- 资助金额:
$ 13.85万 - 项目类别:
Large Databases of Small Molecules - Drug Development Tool and Public Resource
小分子大型数据库 - 药物开发工具和公共资源
- 批准号:
10703018 - 财政年份:
- 资助金额:
$ 13.85万 - 项目类别:
相似海外基金
DMS-EPSRC: Asymptotic Analysis of Online Training Algorithms in Machine Learning: Recurrent, Graphical, and Deep Neural Networks
DMS-EPSRC:机器学习中在线训练算法的渐近分析:循环、图形和深度神经网络
- 批准号:
EP/Y029089/1 - 财政年份:2024
- 资助金额:
$ 13.85万 - 项目类别:
Research Grant
CAREER: Blessing of Nonconvexity in Machine Learning - Landscape Analysis and Efficient Algorithms
职业:机器学习中非凸性的祝福 - 景观分析和高效算法
- 批准号:
2337776 - 财政年份:2024
- 资助金额:
$ 13.85万 - 项目类别:
Continuing Grant
CAREER: From Dynamic Algorithms to Fast Optimization and Back
职业:从动态算法到快速优化并返回
- 批准号:
2338816 - 财政年份:2024
- 资助金额:
$ 13.85万 - 项目类别:
Continuing Grant
CAREER: Structured Minimax Optimization: Theory, Algorithms, and Applications in Robust Learning
职业:结构化极小极大优化:稳健学习中的理论、算法和应用
- 批准号:
2338846 - 财政年份:2024
- 资助金额:
$ 13.85万 - 项目类别:
Continuing Grant
CRII: SaTC: Reliable Hardware Architectures Against Side-Channel Attacks for Post-Quantum Cryptographic Algorithms
CRII:SaTC:针对后量子密码算法的侧通道攻击的可靠硬件架构
- 批准号:
2348261 - 财政年份:2024
- 资助金额:
$ 13.85万 - 项目类别:
Standard Grant
CRII: AF: The Impact of Knowledge on the Performance of Distributed Algorithms
CRII:AF:知识对分布式算法性能的影响
- 批准号:
2348346 - 财政年份:2024
- 资助金额:
$ 13.85万 - 项目类别:
Standard Grant
CRII: CSR: From Bloom Filters to Noise Reduction Streaming Algorithms
CRII:CSR:从布隆过滤器到降噪流算法
- 批准号:
2348457 - 财政年份:2024
- 资助金额:
$ 13.85万 - 项目类别:
Standard Grant
EAGER: Search-Accelerated Markov Chain Monte Carlo Algorithms for Bayesian Neural Networks and Trillion-Dimensional Problems
EAGER:贝叶斯神经网络和万亿维问题的搜索加速马尔可夫链蒙特卡罗算法
- 批准号:
2404989 - 财政年份:2024
- 资助金额:
$ 13.85万 - 项目类别:
Standard Grant
CAREER: Efficient Algorithms for Modern Computer Architecture
职业:现代计算机架构的高效算法
- 批准号:
2339310 - 财政年份:2024
- 资助金额:
$ 13.85万 - 项目类别:
Continuing Grant
CAREER: Improving Real-world Performance of AI Biosignal Algorithms
职业:提高人工智能生物信号算法的实际性能
- 批准号:
2339669 - 财政年份:2024
- 资助金额:
$ 13.85万 - 项目类别:
Continuing Grant