Better Understanding and Handling of Tautomerism
更好地理解和处理互变异构
基本信息
- 批准号:10262460
- 负责人:
- 金额:$ 21.34万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:
- 资助国家:美国
- 起止时间:至
- 项目状态:未结题
- 来源:
- 关键词:AnticoagulantsAppearanceAreaAzidesBibliographyBook ChaptersCarbonChargeChemical StructureChemicalsChemistryChildCodeComputer softwareConflict (Psychology)ContractorCyclizationDataDatabasesEnvironmentEquilibriumEyeHydrogenIndividualInformaticsIntuitionJournalsLeadLiteratureManuscriptsMeasuresMechanicsMethodsModificationMolecular WeightMotivationMovementOrganic ChemistryPaperPhasePreparationPrevalencePropertyProtonsPublic DomainsPublicationsPublishingReactionRecommendationRecordsSamplingSolventsSpectrum AnalysisStructureSystemTechniquesTemperatureTerminologyTestingTetrazolesTriplet Multiple BirthVariantVotingWarfarinWorkX-Ray Crystallographybasecatalystchemical information systemdeep learningfootimprovedinformation modelmigrationpostersquantumquantum computingscreeningsingle bondsmall moleculestructural biologytautomertheoriestoolweb servicesweb siteworking group
项目摘要
One motivation of our tautomerism-related work is thus to use all tools at our disposal, chemoinformatics analyses, QM computations, experimental work, and systematic extraction of results from literature, to provide a scientific footing for the recommendations how to improve handling of tautomerism in InChI V2 - instead of just holding a vote in the Working Group. While prototropic tautomerism rules are the only ones currently implemented as the standard rule set in CACTVS, and all tautomeric transformations covered by InChI (as default or by option) are prototropic, ring-chain (RC) tautomerism is well-known and widespread. Nevertheless, and somewhat surprisingly, very little in terms of RC rules was available in chemoinformatics until recently. Based on Baldwin's well-known set of rules to predict the relative facility of ring forming reactions, we developed a set of 11 rules describing RC tautomerism. The rules were encoded in SMIRKS line notation, the chemical transform extension of the chemical structure line notation SMILES, developed by Daylight Chemical Information Systems, Inc., just like the currently 20 individual rules in CACTVS for describing prototropic tautomerism are encoded. A number of modifications were applied to Baldwin's rule set, which, after all, were rules for ring-closure in general, not for RC tautomerism in specific. Foremost, ring closure and opening reactions involving a tetrahedral electrophilic carbon thus leading to breakage of a single bond would cause a loss of atoms to the molecule, violating the definition of tautomerism. Adding these new RC rules to the existing standard prototropic rules in CACTVS, we applied this combined rule set to the "poster child" of RC tautomerism: warfarin. This anticoagulant drug, in wide use for decades, can theoretically exist in solution in 40 distinct tautomeric forms. We investigated all these tautomers with computational approaches (relative energies calculated at the B3LYP/6-311G+ level of theory) and recorded NMR (13C and 1H) spectra. We introduced an intuitive and graphical network for tautomers and their interconversion paths, which for warfarin contained 11 tautomers and 17 tautomeric transformations between them allowed by our rules. We then applied the combined RC and prototropic rule set to an entire database: the Aldrich Market Select (AMS) database of (then) 6 million screening samples and building blocks. We found over 30,000 cases where two or more AMS products were declared by our rules to be just different tautomeric forms of the same compound. 1H and 13C NMR analysis of 166 such tautomer pairs (plus a few triplets) we purchased from the AMS were performed to determine whether the chemoinformatics transforms had accurately predicted what was the same "stuff in the bottle" as determined by NMR. Essentially all prototropic transforms for which examples in the AMS existed (some of the "rarer" types of tautomerism had no such "conflict pairs" in the AMS) were confirmed. Some of the RC transforms were found to be too "aggressive", i.e. to equate structures with one another that were different compounds according to the NMR analyses. This paper received an Editor's Choice selection in the Journal of Chemical Information and Modeling. In order to provide additional experimental data for tautomerism-related analyses and chemoinformatics work, we have created a database based on data extracted from experimental literature. This database consists of 1,873 entries which belong to n-tuples of tautomers studied in a particular set of experimental conditions (pH, solvent, temperature, technique), adding up to 3,898 records since the average of n is slightly 2. The data were extracted from 73 publications, many of them reviews, taken from a selection of 200 papers provided to the contractor company that did the initial extraction (Parthys Reverse Informatics), out of about 900 papers we identified in literature searches that might contain useful data for this purpose. Each tautomer (or tuple, as appropriate) is annotated with Structural information: SMILES, InChI, InChIKey, NCI/CADD Identifiers; "Prevalence" data: measured ratios, interconversion rates, relative energies etc.; Condition data: solvent, temperature, pH etc. (if given); Method data: NMR, UV spectroscopy, IR spectroscopy etc.; Reference data: Bibliographic information. To the best of our knowledge, such as tautomer database does not exist elsewhere, certainly not in the public domain. A new web service - called Tautomerizer - was created to apply and test the transforms we have compiled from the above database and literature for the Redesign of Handling of Tautomerism in InChI(Key) V.2. The set of transforms compiled in the context of this project has meanwhile grown to its final number of 86, which are also being added to the Tautomerizer. The phase of initiating and then making a decision in the IUPAC Working Group about the final set of transforms to be recommended for InChI V2 has been started. Exploratory coding for adding some of the 86 rukes to the current InChI code (v.1.05) were successful for 6 rules. Work on a second-level analysis of tautomerism based on quantum-mechanical calculations and subsequent Deep Learning approaches has been started. Also, X-ray crystallography on a subset of the small molecules mentioned above has been performed. Several manuscripts about this project have been published or are under preparation.
因此,我们与互变异构相关的工作的一个动机是使用我们可以使用的所有工具、化学信息学分析、QM 计算、实验工作以及从文献中系统提取结果,为如何改进 InChI V2 中互变异构处理的建议提供科学基础,而不仅仅是在工作组中进行投票。虽然原向性互变异构规则是目前在 CACTVS 中作为标准规则集实现的唯一规则,并且 InChI 涵盖的所有互变异构转换(默认或通过选项)都是原向性的,但环链 (RC) 互变异构是众所周知且广泛存在的。然而,令人有些惊讶的是,直到最近,化学信息学中可用的 RC 规则还很少。基于鲍德温预测成环反应相对容易程度的著名规则集,我们开发了一套描述 RC 互变异构现象的 11 条规则。这些规则以 SMIRKS 线符号(SMIRKS 线符号)进行编码,SMIRKS 线符号是化学结构线符号 SMILES 的化学变换扩展,由 Daylight Chemical Information Systems, Inc. 开发,就像目前 CACTVS 中用于描述质子互变异构现象的 20 条单独规则进行编码一样。对鲍德温的规则集进行了许多修改,毕竟这些规则是一般的闭环规则,而不是具体的 RC 互变异构规则。首先,涉及四面体亲电子碳的闭环和开环反应会导致单键断裂,从而导致分子中原子的损失,违反互变异构的定义。将这些新的 RC 规则添加到 CACTVS 中现有的标准原变性规则中,我们将这个组合规则集应用于 RC 互变异构现象的“典范”:华法林。这种抗凝药物已广泛使用数十年,理论上可以以 40 种不同的互变异构体形式存在于溶液中。我们使用计算方法(在 B3LYP/6-311G+ 理论水平计算的相对能量)研究了所有这些互变异构体,并记录了 NMR(13C 和 1H)光谱。我们为互变异构体及其相互转化路径引入了一个直观的图形网络,对于华法林,该网络包含我们的规则允许的 11 个互变异构体和它们之间的 17 个互变异构转化。然后,我们将组合的 RC 和原向性规则集应用到整个数据库:包含(当时)600 万个筛选样本和构建块的 Aldrich Market Select (AMS) 数据库。我们发现超过 30,000 个案例,其中两种或多种 AMS 产品被我们的规则声明为同一化合物的不同互变异构形式。我们对从 AMS 购买的 166 个互变异构体对(加上一些三联体)进行了 1H 和 13C NMR 分析,以确定化学信息学转换是否准确预测了 NMR 确定的“瓶中物质”。基本上 AMS 中存在实例的所有原向性转化(一些“罕见”类型的互变异构体在 AMS 中没有这样的“冲突对”)都得到了证实。一些 RC 转换被发现过于“激进”,即根据 NMR 分析将不同化合物的结构彼此等同。该论文获得了《化学信息与建模杂志》的编辑选择奖。为了为互变异构相关分析和化学信息学工作提供额外的实验数据,我们根据从实验文献中提取的数据创建了一个数据库。该数据库由 1,873 个条目组成,这些条目属于在一组特定实验条件(pH、溶剂、温度、技术)下研究的互变异构体 n 元组,由于 n 的平均值略为 2,因此总共有 3,898 条记录。数据提取自 73 篇出版物,其中许多是评论,选自提供给进行初始提取的承包商公司(Parthys Reverse Informatics)的 200 篇论文,其中 我们在文献检索中发现了大约 900 篇论文,其中可能包含对此目的有用的数据。每个互变异构体(或元组,视情况而定)都用结构信息进行注释:SMILES、InChI、InChIKey、NCI/CADD 标识符; “流行率”数据:测量的比率、相互转化率、相对能量等;条件数据:溶剂、温度、pH 等(如果给出);方法数据:核磁共振、紫外光谱、红外光谱等;参考数据:书目信息。据我们所知,诸如互变异构体数据库在其他地方不存在,当然也不在公共领域。创建了一个名为 Tautomerizer 的新 Web 服务,用于应用和测试我们从上述数据库和文献中编译的转换,以重新设计 InChI(Key) V.2 中的互变异构处理。与此同时,在该项目中编译的转换集已增加到最终数量 86 个,这些转换也将添加到互变异构器中。 IUPAC 工作组就推荐用于 InChI V2 的最终转换集做出决定的阶段已经开始。将 86 个规则中的一些添加到当前 InChI 代码 (v.1.05) 中的探索性编码成功实现了 6 条规则。基于量子力学计算和后续深度学习方法的互变异构二级分析工作已经开始。此外,还对上述小分子的子集进行了 X 射线晶体学分析。有关该项目的几篇手稿已经出版或正在准备中。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
MARC NICKLAUS其他文献
MARC NICKLAUS的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('MARC NICKLAUS', 18)}}的其他基金
HIV Integrase Modeling and Computer-Aided Inhibitor Deve
HIV整合酶建模和计算机辅助抑制剂开发
- 批准号:
7291875 - 财政年份:
- 资助金额:
$ 21.34万 - 项目类别:
HIV Integrase Modeling and Computer-Aided Inhibitor and Microbicide Development
HIV 整合酶建模以及计算机辅助抑制剂和杀菌剂开发
- 批准号:
10702372 - 财政年份:
- 资助金额:
$ 21.34万 - 项目类别:
Large Databases of Small Molecules - Drug Development Tool and Public Resource
小分子大型数据库 - 药物开发工具和公共资源
- 批准号:
10262724 - 财政年份:
- 资助金额:
$ 21.34万 - 项目类别:
Large Databases of Small Molecules - Drug Development Tool and Public Resource
小分子大型数据库 - 药物开发工具和公共资源
- 批准号:
10703018 - 财政年份:
- 资助金额:
$ 21.34万 - 项目类别:
HIV Integrase Modeling and Computer-Aided Inhibitor Development
HIV 整合酶建模和计算机辅助抑制剂开发
- 批准号:
7965392 - 财政年份:
- 资助金额:
$ 21.34万 - 项目类别:
Large Databases of Small Molecules - Drug Development Tool and Public Resource
小分子大型数据库 - 药物开发工具和公共资源
- 批准号:
10926595 - 财政年份:
- 资助金额:
$ 21.34万 - 项目类别:
相似海外基金
AMFaces: Advanced Additive Manufacturing of User-Focused Facial Prostheses with Real-Life Colour Appearance
AMFaces:以用户为中心的面部假体的先进增材制造,具有真实的色彩外观
- 批准号:
EP/W033968/1 - 财政年份:2023
- 资助金额:
$ 21.34万 - 项目类别:
Research Grant
Understanding the appearance mechanism of ferroelectric liquid crystals showing spontaneous polarization in the director and developing their applications.
了解铁电液晶在指向矢中表现出自发极化的出现机制并开发其应用。
- 批准号:
23H00303 - 财政年份:2023
- 资助金额:
$ 21.34万 - 项目类别:
Grant-in-Aid for Scientific Research (A)
Elucidating the mechanism in the color appearance of small-field stimulus on chromatic surroundings
阐明彩色环境中小场刺激的颜色外观机制
- 批准号:
22K20317 - 财政年份:2022
- 资助金额:
$ 21.34万 - 项目类别:
Grant-in-Aid for Research Activity Start-up
Body, appearance, and health surveillance in female youth friendship contexts
女性青少年友谊背景下的身体、外表和健康监测
- 批准号:
2690554 - 财政年份:2022
- 资助金额:
$ 21.34万 - 项目类别:
Studentship
Learning to Recognize Faces Despite Within-Person Variability in Appearance: A Developmental Approach
尽管人与人之间的外表存在差异,但仍要学习识别面孔:一种发展方法
- 批准号:
RGPIN-2022-04386 - 财政年份:2022
- 资助金额:
$ 21.34万 - 项目类别:
Discovery Grants Program - Individual
Path-space Exploration for Light Transport and Appearance Modelling
光传输和外观建模的路径空间探索
- 批准号:
RGPIN-2018-05669 - 财政年份:2022
- 资助金额:
$ 21.34万 - 项目类别:
Discovery Grants Program - Individual
Appearance of negative influences of global warming on crop production and measures against it
全球变暖对农作物生产的负面影响的显现及应对措施
- 批准号:
21H02330 - 财政年份:2021
- 资助金额:
$ 21.34万 - 项目类别:
Grant-in-Aid for Scientific Research (B)
A sociological study on appearance discrimination during employment selection
就业选择中外表歧视的社会学研究
- 批准号:
21K13447 - 财政年份:2021
- 资助金额:
$ 21.34万 - 项目类别:
Grant-in-Aid for Early-Career Scientists
The influence of river environment on urban appearance of wildlife
河流环境对野生动物城市面貌的影响
- 批准号:
21K12322 - 财政年份:2021
- 资助金额:
$ 21.34万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Method for assessing women's perceptions of their appearance in the context of breast cancer care
评估乳腺癌护理背景下女性对其外表的看法的方法
- 批准号:
10196213 - 财政年份:2021
- 资助金额:
$ 21.34万 - 项目类别: