权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Deep learning methods to accelerate discoveryof drugs targeting gene regulatory proteins

深度学习方法加速发现针对基因调节蛋白的药物

基本信息

批准号：
10599781
负责人：
William Ellis Fondrie
金额：
$ 39.84万
依托单位：
TALUS BIOSCIENCE, INC.
依托单位国家：
美国
项目类别：
财政年份：
2023
资助国家：
美国
起止时间：
2023-09-20 至 2024-08-31
项目状态：
已结题

来源：
https://reporter.nih.gov/project-details/10599781
关键词：
3-Dimensional Acceleration Affect Algorithms Behavior Binding Bioinformatics Biological Assay Cells Chromatin Code Communities Complex Computer software Consumption DNA sequencing Data Data Set Development Diabetes Mellitus Disease Drug Compounding Drug Targeting Event Future Genomics Human Image Image Compression Individual Intervention Knowledge Laboratories Lead Learning Licensing Liquid substance Malignant Neoplasms Mass Spectrum Analysis Measurement Measures Methods Modeling Pattern Peptides Pharmaceutical Preparations Pharmacologic Substance Phase Physiological Processes Physiology Process Protein Analysis Proteins Proteome Proteomics Regulator Genes Research Research Personnel Running Sampling Services Set protein Signal Transduction Small Business Innovation Research Grant Speed Systems Analysis Talus Techniques Technology Testing Therapeutic Time Training United States National Institutes of Health X-Ray Computed Tomography blind cell type chromatin protein commercialization cost deep learning design digital imaging drug candidate drug development drug discovery experimental study genetic regulatory protein genomic data improved instrument learning strategy machine learning algorithm machine learning method novel therapeutics open source programs protein complex protein expression screening services

项目摘要

SUMMARY To evaluate how a drug candidate affects cells, researchers often study how the abundance or behavior of a specific set of proteins is changed by treatment with each compound. However, it is not currently possible to test the effect of every possible drug compound (>500,000) on every human protein (~20,000) in hundreds of different types of cells. Even the most advanced protein analysis systems available today could only measure and process a tiny fraction of these combinations in a feasible timeframe. One method of measuring the abundance of all the proteins in a cell sample is mass spectrometry, but available instruments can only analyze several samples per day. To increase the throughput of these mass spectrometry experiments, in Aim 1 of the proposed project we will develop a machine learning algorithm that will reconstruct the peptide composition of a large number of samples from measurements of a smaller number of mixtures of those samples. This technology, called “compressed sensing” was developed for digital imaging to reduce (com- press) the file size of an image. Importantly, it can also “decompress” a low amount of collected information to reconstruct an image with surprisingly high detail. Similarly, we will develop a compressed sensing algorithm to extract the individual protein profiles from mixtures of multiple combined samples. Initially, this approach will analyze 1,000 samples from 250 measurements of mixtures of those samples, providing a 4-fold increase in speed. Ultimately, with a much higher number of samples, it may allow a 100-fold increase in samples analyzed. To accelerate interpretation of this type of data for drug discovery, we will create a machine learning algorithm to simplify complex patterns of interactions between test compounds and the proteins within various types of cells. Previously acquired data will be modeled to learn the effects of individual compounds on various proteins. By learning from a large number of these data sets that describe interactions between specific compounds and proteins, in many different cell types, the model will be able to predict the effect of untested compounds on proteins within various types of cells. In addition, it will be able to indicate which experiments would be most useful to perform in the future, to obtain information on classes of compounds or proteins that are lacking in the current data sets. The combination of these two techniques has the potential to greatly accelerate development of novel drugs by providing a potentially huge increase in protein abundance measurements, along with a powerful method to predict how drugs will alter the expression of proteins in cells.

总结为了评估候选药物如何影响细胞，研究人员经常研究药物的丰度或行为如何影响细胞。通过用每种化合物处理来改变特定的蛋白质组。不过，目前还无法测试每种可能的药物化合物（> 500，000）对数百种不同的人类蛋白质（~ 20，000）的影响细胞类型。即使是当今最先进的蛋白质分析系统也只能测量和处理在可行的时间范围内，这些组合的一小部分。测量细胞样品中所有蛋白质丰度的一种方法是质谱法，但可用的方法是：仪器每天只能分析几个样品。为了提高这些质谱分析的通量实验，在目标1的建议项目，我们将开发一个机器学习算法，将重建大量样品的肽组成来自对少量的肽的混合物的测量，这些样本。这种被称为“压缩传感”的技术是为数字成像而开发的，以减少（COM）按下）图像的文件大小。重要的是，它还可以“隐藏”少量收集的信息，重建一幅细节惊人的图像同样，我们将开发一种压缩感知算法，从多个组合样品的混合物中提取单个蛋白质谱。首先，这种方法将分析1,000个样品，从250个测量这些样品的混合物，提供4倍的增加，速度最终，随着样本数量的增加，它可能允许分析的样本增加100倍。为了加快对这类数据的解释，以用于药物发现，我们将创建一个机器学习算法。为了简化测试化合物和各种类型的蛋白质之间的复杂的相互作用模式，细胞先前获得的数据将被建模，以了解单个化合物对各种蛋白质的影响。通过从大量描述特定化合物之间相互作用的数据集中学习，蛋白质，在许多不同的细胞类型，该模型将能够预测未经测试的化合物对各种类型的细胞中的蛋白质。此外，它将能够表明哪些实验将是最重要的，有用的执行在未来，以获得信息的化合物或蛋白质的类别是缺乏的，目前的数据集。这两种技术的结合有可能大大加速新药的开发，提供了一个潜在的巨大增加，蛋白质丰度测量，沿着与一个强大的方法，预测药物如何改变细胞中蛋白质的表达。