权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

III: Medium: Scalable Machine Learning for Genome-Wide Association Analyses

III：媒介：用于全基因组关联分析的可扩展机器学习

基本信息

批准号：
1705121
负责人：
Sriram Sankararaman
金额：
$ 97.52万
依托单位：
University of California-Los Angeles
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2017
资助国家：
美国
起止时间：
2017-07-01 至 2022-06-30
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1705121&HistoricalAwards=false
关键词：
III Medium Scalable Machine Learning

项目摘要

Over the past decade, genome-wide association studies (GWAS) have discovered genetic variants associated with numerous diseases as well as other complex phenotypes. Despite their success, major gaps remain in our understanding of how genetic changes affect phenotype. These gaps, coupled with advances in high-throughput technologies to measure genetic variation, have motivated GWAS of increasingly larger scale. However, the statistical and computational challenges posed by the scale and complexity of these studies present a critical bottleneck in realizing their promise. These recent advances in scalable ML provide the potential for paradigm-shifting advances in the field of GWAS. However, these concepts have yet to be rigorously explored in the context of the GWAS modeling and testing problems. Exploring the intersection of these domains introduces fundamentally new statistical and computational challenges. The team will develop a suite of modeling and testing methods that target massive modern genomics datasets. The techniques that we will build upon include low-rank matrix approximation, kernel methods and matrix completion. They will also provide open-source software tailored to parallel and distributed computing environments to facilitate wide-spread adoption of methods.Exploring GWAS through the lens of scalable machine learning introduces several research directions and requires the development of novel algorithms and analyses. Firstly, the focus of much scalable ML research has been on the statistical task of prediction, while GWAS inference problems also emphasize hypothesis testing and parameter estimation. Characterizing the behavior of scalable ML methods in these novel settings is a challenging open problem. The team will develop principled GWAS modeling and testing methods. The results to also be of great interest to the scalable ML community. Secondly, while scalable ML techniques are designed to be general purpose and domain-agnostic, the GWAS setting introduces rich biologically-motivated domain knowledge that needs to be leveraged to improve the quality of inference. Statistical models that are able to encode this prior knowledge while still permitting efficient inference will be developed. Ultimately the algorithms will be implemented as efficient parallel and distributed algorithms for these core modeling and testing problems, and develop robust open-source implementations that leverage modern computing infrastructure.1The proposed methods will dramatically improve the scalability of current GWAS analyses, on the one hand, while enabling the development of increasingly realistic genomic models, on the other. Collaborations and open-source artifacts will enable the wide-spread adoption of these methods by the human genetics community. This project will lead to a closer interaction of the genomics and machine learning communities at UCLA and outside.

在过去的十年中，全基因组关联研究（GWAS）已经发现了与许多疾病以及其他复杂表型相关的遗传变异。尽管他们取得了成功，但我们对遗传变化如何影响表型的理解仍然存在重大差距。这些差距，再加上高通量技术的进步，以衡量遗传变异，促使GWAS的规模越来越大。然而，这些研究的规模和复杂性所带来的统计和计算挑战是实现其承诺的关键瓶颈。可扩展ML的这些最新进展为GWAS领域的范式转移提供了潜力。然而，这些概念还有待于在GWAS建模和测试问题的上下文中进行严格的探索。探索这些领域的交集带来了全新的统计和计算挑战。该团队将开发一套针对大规模现代基因组学数据集的建模和测试方法。我们将建立的技术包括低秩矩阵近似，核方法和矩阵完成。他们还将提供为并行和分布式计算环境量身定制的开源软件，以促进方法的广泛采用。通过可扩展机器学习的透镜探索GWAS引入了几个研究方向，并需要开发新的算法和分析。首先，许多可扩展的ML研究的重点是预测的统计任务，而GWAS推理问题也强调假设检验和参数估计。在这些新的环境中描述可扩展ML方法的行为是一个具有挑战性的开放问题。该团队将开发原则性的GWAS建模和测试方法。这些结果也引起了可扩展ML社区的极大兴趣。其次，虽然可扩展的ML技术被设计为通用和领域不可知的，但GWAS设置引入了丰富的生物学驱动的领域知识，需要利用这些知识来提高推理的质量。将开发能够对这种先验知识进行编码，同时仍然允许有效推理的统计模型。最终，这些算法将被实现为这些核心建模和测试问题的高效并行和分布式算法，并开发利用现代计算基础设施的强大的开源实现。1所提出的方法将大大提高当前GWAS分析的可扩展性，一方面，同时使越来越现实的基因组模型的开发成为可能。合作和开放源代码的人工制品将使人类遗传学界能够广泛采用这些方法。该项目将导致基因组学和机器学习社区在加州大学洛杉矶分校和外部更密切的互动。

项目成果

期刊论文数量（23）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Learning Fair Representations for Kernel Models

学习内核模型的公平表示

DOI：
发表时间：
2020
期刊：
Twenty Third International Conference on Artificial Intelligence and Statistics
影响因子：
0
作者：
Tan, Zilong;Yeom, Samuel;Fredrikson, Matt;Talwalkar, Ameet
通讯作者：
Talwalkar, Ameet

A Unifying Framework for Imputing Summary Statistics in Genome-Wide Association Studies

全基因组关联研究中汇总统计数据的统一框架

DOI：
10.1089/cmb.2019.0449
发表时间：
2020
期刊：
Journal of Computational Biology
影响因子：
1.7
作者：
Wu, Yue;Eskin, Eleazar;Sankararaman, Sriram
通讯作者：
Sankararaman, Sriram

CONTRA: Contrarian statistics for controlled variable selection

DOI：
发表时间：
2021-04
期刊：
Proceedings of machine learning research
影响因子：
0
作者：
Mukund Sudarshan;A. Puli;Lakshminarayanan Subramanian;S. Sankararaman;R. Ranganath
通讯作者：
Mukund Sudarshan;A. Puli;Lakshminarayanan Subramanian;S. Sankararaman;R. Ranganath

Quantifying the contribution of dominance deviation effects to complex trait variation in biobank-scale data.

DOI：
10.1016/j.ajhg.2021.03.018
发表时间：
2021-05-06
期刊：
American journal of human genetics
影响因子：
9.8
作者：
Pazokitoroudi A;Chiu AM;Burch KS;Pasaniuc B;Sankararaman S
通讯作者：
Sankararaman S

STENSL: Microbial Source Tracking with ENvironment SeLection.

DOI：
10.1128/msystems.00995-21
发表时间：
2022-10-26
期刊：
mSystems
影响因子：
6.4
作者：
通讯作者：

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Sriram Sankararaman其他文献

Characterizing the genetic architecture of drug response using gene-context interaction methods

利用基因-环境相互作用方法描绘药物反应的遗传结构

DOI：
10.1016/j.xgen.2024.100722
发表时间：
2024-12-11
期刊：
Cell Genomics
影响因子：
9.000
作者：
Michal Sadowski;Mike Thompson;Joel Mefford;Tanushree Haldar;Akinyemi Oni-Orisan;Richard Border;Ali Pazokitoroudi;Na Cai;Julien F. Ayroles;Sriram Sankararaman;Andy W. Dahl;Noah Zaitlen
通讯作者：
Noah Zaitlen

dotears: Scalable and consistent directed acyclic graph estimation using observational and interventional data

多泪：使用观测数据和干预数据进行可扩展且一致的有向无环图估计

DOI：
10.1016/j.isci.2024.111673
发表时间：
2025-02-21
期刊：
iScience
影响因子：
4.100
作者：
Albert Xue;Jingyou Rao;Sriram Sankararaman;Harold Pimentel
通讯作者：
Harold Pimentel

Identifying common disease trajectories of Alzheimer’s disease with electronic health records

利用电子健康记录识别阿尔茨海默病的常见疾病轨迹

DOI：
10.1016/j.ebiom.2025.105831
发表时间：
2025-08-01
期刊：
EBioMedicine
影响因子：
10.800
作者：
Mingzhou Fu;Sriram Sankararaman;Bogdan Pasaniuc;Keith Vossel;Timothy S. Chang
通讯作者：
Timothy S. Chang

OP-CBIO201112 5640..5648

DOI：
发表时间：
2021
期刊：
影响因子：
0
作者：
A. Majumdar;Kathryn S. Burch;Tanushree Haldar;Sriram Sankararaman;Bogdan Pasaniuc;W. J. Gauderman;John S. Witte
通讯作者：
John S. Witte

Investigating the sources of variable impact of pathogenic variants in monogenic metabolic conditions

研究单基因代谢疾病中致病变异的可变影响的来源

DOI：
10.1038/s41467-025-60339-7
发表时间：
2025-06-05
期刊：
Nature Communications
影响因子：
15.700
作者：
Angela Wei;Richard Border;Boyang Fu;Sinéad Cullina;Nadav Brandes;Seon-Kyeong Jang;Sriram Sankararaman;Eimear E. Kenny;Miriam S. Udler;Vasilis Ntranos;Noah Zaitlen;Valerie A. Arboleda
通讯作者：
Valerie A. Arboleda