CAREER: Scalable Software and Algorithmic Infrastructure for Probabilistic Graphical Modeling

职业:用于概率图形建模的可扩展软件和算法基础设施

基本信息

  • 批准号:
    1845840
  • 负责人:
  • 金额:
    $ 48.76万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Continuing Grant
  • 财政年份:
    2019
  • 资助国家:
    美国
  • 起止时间:
    2019-02-15 至 2025-01-31
  • 项目状态:
    未结题

项目摘要

The data-driven reasoning is one of the major factors propelling progress in science and engineering. In many practical applications, especially in biology and medicine, data-driven reasoning has been based on probabilistic graphical models, that are preferred because of the accuracy in data representation and ease of interpretation. In probabilistic graphical modeling, the modeled objects, for example, the attributes of a patient stored in electronic health records, are represented as random variables, and the goal is to learn dependencies between these variables. However, the methods to learn high-quality probabilistic graphical models from the data are computationally challenging, and do not scale to datasets emerging in modern applications. Therefore, this project aims to enable high-quality probabilistic graphical modeling of large datasets by using high performance computing techniques. To this end, the project introduces a framework of fundamental operations underlying probabilistic graphical modeling, including for managing data and coordinating computations, together with their software fulfillment, that can efficiently leverage large-scale parallel computers. The framework is designed to benefit both practitioners interested in the analysis of large-scale data, and researchers interested in the development of new learning algorithms. The validation and evaluation of the framework is based on the analysis of electronic health records with the goal of early prognosis and diagnosis of patients with chronic obstructive pulmonary disease - problems vital for improving quality and reducing the cost of healthcare. Furthermore, the framework provides the foundation to train the next generation of biomedical professionals in the use of data analytics on advanced cyberinfrastructure. Thus, the project is aligned with NSF's mission to promote the progress of science, and to advance the national health, prosperity and welfare.This project responds to the recognized and growing demand for scalable machine learning methods that could capitalize on parallel architectures such as large clusters of multi-core processors. The research focus is on exact structure learning of probabilistic graphical models, for example Bayesian networks and Markov random fields, in the context of biomedical data analytics. The project is based on the two main components: a new high performance abstraction for managing data in machine learning applications, including memory efficient strategies for answering counting queries on multi-core processors, and a new programming model for distributed memory systems to facilitate efficient exploration of large-scale combinatorial search spaces, such as those described by tress, lattices or graphs. These abstractions are used to realize a set of new parallel, exact algorithms for structure search, and the related problems, for example Markov blankets identification, that accelerate learning by exploiting various properties of the input data and the underlying search spaces. The research component is driven by the timely and socially relevant application in personalized and preventive medicine, enabled by a massive collection of the actual electronic health records. The project aims to delivers multiple artifacts, including MPI and OpenMP-based software, benchmark data and educational materials, all released as open source for use, further development, enhancement, and incorporation by the community. The research activities are tightly coupled with multiple educational efforts, spanning development of an interdisciplinary course for medical professionals to train them in the use of advanced cyberinfrastructure, engagement of undergraduate students and underrepresented minorities in research, and outreach to middle and high school students to attract them to STEM.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
数据驱动推理是推动科学和工程进步的主要因素之一。在许多实际应用中,特别是在生物学和医学中,数据驱动推理一直基于概率图模型,由于数据表示的准确性和易于解释而受到青睐。在概率图形建模中,被建模的对象,例如,存储在电子健康记录中的患者的属性,被表示为随机变量,目标是学习这些变量之间的依赖关系。然而,从数据中学习高质量概率图模型的方法在计算上具有挑战性,并且不能扩展到现代应用中出现的数据集。因此,该项目旨在通过使用高性能计算技术来实现大型数据集的高质量概率图形建模。为此,该项目引入了概率图形建模基础操作的框架,包括管理数据和协调计算,以及它们的软件实现,可以有效地利用大规模并行计算机。该框架旨在使对大规模数据分析感兴趣的从业者和对新学习算法开发感兴趣的研究人员都受益。该框架的验证和评估是基于对电子健康记录的分析,目标是对慢性阻塞性肺疾病患者进行早期预后和诊断-这是提高质量和降低医疗成本的关键问题。此外,该框架为培训下一代生物医学专业人员在先进的网络基础设施上使用数据分析提供了基础。因此,该项目符合NSF的使命,以促进科学的进步,并推进国家的健康,繁荣和福利。该项目响应了公认的和不断增长的需求,可扩展的机器学习方法,可以利用并行架构,如大型集群的多核处理器。研究重点是概率图模型的精确结构学习,例如贝叶斯网络和马尔可夫随机场,在生物医学数据分析的背景下。该项目基于两个主要组成部分:用于管理机器学习应用程序中的数据的新的高性能抽象,包括用于在多核处理器上回答计数查询的内存高效策略,以及用于分布式内存系统的新编程模型,以促进大规模组合搜索空间的有效探索,例如由树,格或图描述的空间。这些抽象被用来实现一组新的并行,精确的算法结构搜索,以及相关的问题,例如马尔可夫毯识别,加速学习,利用各种属性的输入数据和底层的搜索空间。研究部分是由个性化和预防医学的及时和社会相关的应用驱动的,通过大量收集实际的电子健康记录来实现。该项目旨在提供多个工件,包括基于MPI和OpenMP的软件,基准数据和教育材料,所有这些都作为开源发布,供社区使用,进一步开发,增强和合并。研究活动与多种教育工作紧密结合,跨越为医学专业人员开发跨学科课程,培训他们使用先进的网络基础设施,本科生和代表性不足的少数民族参与研究,和推广到初中和高中学生,以吸引他们到干。这个奖项反映了NSF的法定使命,并已被认为是值得通过评估使用的支持基金会的学术价值和更广泛的影响审查标准。

项目成果

期刊论文数量(4)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Longitudinal K-means approaches to clustering and analyzing EHR opioid use trajectories for clinical subtypes.
  • DOI:
    10.1016/j.jbi.2021.103889
  • 发表时间:
    2021-10
  • 期刊:
  • 影响因子:
    4.5
  • 作者:
    Mullin, Sarah;Zola, Jaroslaw;Lee, Robert;Hu, Jinwei;MacKenzie, Brianne;Brickman, Arlen;Anaya, Gabriel;Sinha, Shyamashree;Li, Angie;Elkin, Peter L.
  • 通讯作者:
    Elkin, Peter L.
GraSPI: Extensible software for the graph-based quantification of morphology in organic electronics
  • DOI:
    10.1016/j.softx.2021.100969
  • 发表时间:
    2022-01
  • 期刊:
  • 影响因子:
    3.4
  • 作者:
    Devyani Jivani;J. Zola;B. Ganapathysubramanian;O. Wodo
  • 通讯作者:
    Devyani Jivani;J. Zola;B. Ganapathysubramanian;O. Wodo
End-to-end Bayesian Networks Exact Learning in Shared Memory
共享内存中的端到端贝叶斯网络精确学习
Counting Induced 6-Cycles in Bipartite Graphs
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Jaroslaw Zola其他文献

COMODO: Configurable morphology distance operator
  • DOI:
    10.1016/j.commatsci.2024.113208
  • 发表时间:
    2024-09-01
  • 期刊:
  • 影响因子:
  • 作者:
    Parth Desai;Namit Juneja;Varun Chandola;Jaroslaw Zola;Olga Wodo
  • 通讯作者:
    Olga Wodo
SCoOL - Scalable Common Optimization Library
SCoOL - 可扩展的通用优化库

Jaroslaw Zola的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Jaroslaw Zola', 18)}}的其他基金

Mentoring the Next Generation of Parallel Processing Researchers at IEEE-CSTCPP Sponsored Conferences
在 IEEE-CSTCPP 赞助的会议上指导下一代并行处理研究人员
  • 批准号:
    1937369
  • 财政年份:
    2019
  • 资助金额:
    $ 48.76万
  • 项目类别:
    Standard Grant
CNS Core: Small: Rethinking the Software Architecture for Mobile DNA Analysis
CNS 核心:小型:重新思考移动 DNA 分析的软件架构
  • 批准号:
    1910193
  • 财政年份:
    2019
  • 资助金额:
    $ 48.76万
  • 项目类别:
    Standard Grant
OAC Core: Small: Scalable Non-linear Dimensionality Reduction Methods to Accelerate Scientific Discovery
OAC 核心:小型:加速科学发现的可扩展非线性降维方法
  • 批准号:
    1910539
  • 财政年份:
    2019
  • 资助金额:
    $ 48.76万
  • 项目类别:
    Standard Grant
Collaborative Research: Mentoring the Next Generation of Parallel Processing Researchers at IPDPS and other IEEE-CSTCPP Sponsored Conferences
协作研究:在 IPDPS 和其他 IEEE-CSTCPP 赞助的会议上指导下一代并行处理研究人员
  • 批准号:
    1832257
  • 财政年份:
    2018
  • 资助金额:
    $ 48.76万
  • 项目类别:
    Standard Grant
Student Travel Support: ACM International Workshop on Big Data in Life Sciences, Seattle, WA, October 2, 2016
学生旅行支持:ACM 国际生命科学大数据研讨会,华盛顿州西雅图,2016 年 10 月 2 日
  • 批准号:
    1638757
  • 财政年份:
    2016
  • 资助金额:
    $ 48.76万
  • 项目类别:
    Standard Grant
Collaborative Research: Student Travel Support: International Workshop on Big Data in Life Sciences, Newport Beach, CA, September 20, 2014
合作研究:学生旅行支持:生命科学大数据国际研讨会,加利福尼亚州纽波特比奇,2014 年 9 月 20 日
  • 批准号:
    1444794
  • 财政年份:
    2014
  • 资助金额:
    $ 48.76万
  • 项目类别:
    Standard Grant
Collaborative Research: CDS&E: Sculpting fluid flow using a programmed sequence of micro-pillars
合作研究:CDS
  • 批准号:
    1460244
  • 财政年份:
    2014
  • 资助金额:
    $ 48.76万
  • 项目类别:
    Standard Grant
Collaborative Research: Student Travel Support: International Workshop on Big Data in Life Sciences, Newport Beach, CA, September 20, 2014
合作研究:学生旅行支持:生命科学大数据国际研讨会,加利福尼亚州纽波特比奇,2014 年 9 月 20 日
  • 批准号:
    1461484
  • 财政年份:
    2014
  • 资助金额:
    $ 48.76万
  • 项目类别:
    Standard Grant
Collaborative Research: CDS&E: Sculpting fluid flow using a programmed sequence of micro-pillars
合作研究:CDS
  • 批准号:
    1307743
  • 财政年份:
    2013
  • 资助金额:
    $ 48.76万
  • 项目类别:
    Standard Grant

相似国自然基金

Scalable Learning and Optimization: High-dimensional Models and Online Decision-Making Strategies for Big Data Analysis
  • 批准号:
  • 批准年份:
    2024
  • 资助金额:
    万元
  • 项目类别:
    合作创新研究团队

相似海外基金

CAREER: Scalable Software Infrastructure for Analyzing Complex Networks
职业:用于分析复杂网络的可扩展软件基础设施
  • 批准号:
    2339607
  • 财政年份:
    2024
  • 资助金额:
    $ 48.76万
  • 项目类别:
    Continuing Grant
CAREER: Enabling Scalable and Resilient Quantum Computer Architectures through Synergistic Hardware-Software Co-Design
职业:通过协同硬件软件协同设计实现可扩展且有弹性的量子计算机架构
  • 批准号:
    2340267
  • 财政年份:
    2024
  • 资助金额:
    $ 48.76万
  • 项目类别:
    Continuing Grant
RII Track-4: NSF: Extracting Pan Genomic Information from Metagenomic Data: Distributed Algorithms and Scalable Software
RII Track-4:NSF:从宏基因组数据中提取泛基因组信息:分布式算法和可扩展软件
  • 批准号:
    2327456
  • 财政年份:
    2024
  • 资助金额:
    $ 48.76万
  • 项目类别:
    Standard Grant
Collaborative Research: CCRI: New: A Scalable Hardware and Software Environment Enabling Secure Multi-party Learning
协作研究:CCRI:新:可扩展的硬件和软件环境支持安全的多方学习
  • 批准号:
    2347617
  • 财政年份:
    2023
  • 资助金额:
    $ 48.76万
  • 项目类别:
    Standard Grant
Deployment of Scalable System Software for Machine Learning Technology to Saving Computing Resources
部署机器学习技术的可扩展系统软件以节省计算资源
  • 批准号:
    23H03369
  • 财政年份:
    2023
  • 资助金额:
    $ 48.76万
  • 项目类别:
    Grant-in-Aid for Scientific Research (B)
Unified, Scalable, and Reproducible Neurostatistical Software
统一、可扩展且可重复的神经统计软件
  • 批准号:
    10725500
  • 财政年份:
    2023
  • 资助金额:
    $ 48.76万
  • 项目类别:
Accelerating adoption of trustworthy AI in radiology: scalable software for non-technical clinical users to independently validate commercial products at local sites
加速在放射学中采用值得信赖的人工智能:为非技术临床用户提供可扩展的软件,以在本地站点独立验证商业产品
  • 批准号:
    10064189
  • 财政年份:
    2023
  • 资助金额:
    $ 48.76万
  • 项目类别:
    Collaborative R&D
CAREER: Scalable Assurance via Verifiable Hardware-Software Contracts
职业:通过可验证的硬件软件合同提供可扩展的保证
  • 批准号:
    2236855
  • 财政年份:
    2023
  • 资助金额:
    $ 48.76万
  • 项目类别:
    Continuing Grant
SHF: Medium: Efficient and Scalable Pattern Matching via Hardware-Software Co-Design
SHF:中:通过软硬件协同设计实现高效且可扩展的模式匹配
  • 批准号:
    2313062
  • 财政年份:
    2023
  • 资助金额:
    $ 48.76万
  • 项目类别:
    Continuing Grant
PreSize Net medical device software for realistic surgery planning: next-generation scalable technology for selecting the best surgical scenario for every patient
用于现实手术规划的 PreSize Net 医疗设备软件:下一代可扩展技术,可为每位患者选择最佳手术方案
  • 批准号:
    10055877
  • 财政年份:
    2023
  • 资助金额:
    $ 48.76万
  • 项目类别:
    Collaborative R&D
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了