权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

III: Small: Predictive Modeling from High-Dimensional, Sparsely and Irregularly Sampled, Longitudinal Data

III：小：根据高维、稀疏和不规则采样的纵向数据进行预测建模

基本信息

批准号：
2226025
负责人：
Vasant Honavar
金额：
$ 59.99万
依托单位：
Pennsylvania State Univ University Park
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2022
资助国家：
美国
起止时间：
2022-10-01 至 2025-09-30
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2226025&HistoricalAwards=false
关键词：
III Small Predictive Modeling Dimensional

项目摘要

Longitudinal data resulting from repeated observations from a set of individuals over time are commonplace in many applications, including health sciences, learning sciences, social sciences, life sciences, and economics. Such data present unprecedented opportunities to uncover the relationship between the time- varying patterns of certain measured variables (features or covariates) and outcomes of interest e.g., economic meltdown societal unrest, disease onset, health risk, etc. In real-world settings, the number of variables is often very large; often only a small subset of variables is recorded at any given time, resulting in sparse data with a high proportion of missing observations. Furthermore, such data exhibit complex correlations which if not properly accounted for, can lead to misleading statistical inferences. Additional complications arise from the fact that the data exhibit abrupt discontinuities that are often driven by transitions between states that are not directly observable (e.g., from "healthy" to "infected"). Large size of data sets demand methods that are scalable. And in high stakes applications, e.g., healthcare, human interpretability of the predictive models is of paramount importance. The project will yield substantial advances over the current state-of-the-art in scalable machine learning methods for predictive modeling of longitudinal outcomes from high-dimensional, irregularly sampled, sparse, longitudinal health data. The open-source implementations of the predictive modeling tools will find applications in many domains including behavioral, social, environmental, economic, learning, and health sciences. The project will enhance the research-based training of a diverse graduate and undergraduate students in Data Sciences and Computer Science (especially Artificial Intelligence), areas of great national importance. The educational activities associated with the project will help equip a diverse cadre of Data Scientists, AI experts, and health sciences, social sciences, learning sciences, and related areas with state-of-the-art machine learning tools for predictive modeling from longitudinal data. The project will produce a new graduate course and course modules, sample projects, etc. on predictive modeling from longitudinal data to be integrated into Data Sciences curricula. The project will help introduce students from diverse backgrounds, including women and underrepresented minorities, to a broad range of educational, research, and career opportunities in Data Sciences. The broader impacts of the project will be further enhanced by broad dissemination of all research results (publications, software, data sets, course materials).The project will develop a family of scalable deep kernel gaussian process regression algorithms for interpretable predictive modeling from high dimensional, sparsely and irregularly time sampled, longitudinal data with complex, a priori unknown correlation structure. The resulting methods will be able to discover the patterns of transitions between unobserved or hidden states, account for abrupt discontinuities in outcomes. They will be able to explain their predictions by learning the underlying complex correlation structure exhibited by the data and by identifying not only the variables that drive the predictions, but also the temporal context in which they do so. The project will rigorously empirically evaluate the resulting methods with simulated longitudinal data (with different correlation structures, different missingness mechanisms, different time-dependent variable importance), several benchmark longitudinal data sets, and, most importantly, deidentified longitudinal electronic health records data and socio-demographic data from real-world healthcare applications (in collaboration with clinical experts).This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

从一组个体随时间的重复观察得到的纵向数据在许多应用中是常见的，包括健康科学、学习科学、社会科学、生命科学和经济学。这些数据提供了前所未有的机会来揭示某些测量变量（特征或协变量）的时变模式与感兴趣的结果之间的关系，在现实世界中，变量的数量通常非常大;通常在任何给定时间只记录一小部分变量，导致数据稀疏，缺失观测的比例很高。此外，这些数据表现出复杂的相关性，如果不加以适当解释，可能导致误导性的统计推断。额外的复杂性来自于数据表现出突然的不连续性的事实，该突然的不连续性通常由不可直接观察的状态之间的转变（例如，从“健康”到“感染”。大规模的数据集需要可扩展的方法。在高风险应用中，例如，在医疗保健领域，预测模型的人类可解释性至关重要。该项目将在当前最先进的可扩展机器学习方法中取得实质性进展，用于从高维，不规则采样，稀疏的纵向健康数据中预测纵向结果。预测建模工具的开源实现将在许多领域得到应用，包括行为、社会、环境、经济、学习和健康科学。该项目将加强对数据科学和计算机科学（特别是人工智能）的多样化研究生和本科生的研究培训，这些领域具有重要的国家意义。与该项目相关的教育活动将有助于为数据科学家、人工智能专家、健康科学、社会科学、学习科学和相关领域的各种骨干队伍提供最先进的机器学习工具，用于从纵向数据进行预测建模。该项目将产生一个新的研究生课程和课程模块，样本项目等，从纵向数据预测建模将被集成到数据科学课程。该项目将帮助介绍来自不同背景的学生，包括妇女和代表性不足的少数民族，以广泛的教育，研究和职业机会在数据科学。通过广泛传播所有研究成果（出版物、软件、数据集、课程材料），该项目的更广泛影响将得到进一步加强。该项目将开发一系列可扩展的深核高斯过程回归算法，用于从高维、稀疏和不规则时间采样、具有复杂、先验未知相关结构的纵向数据中进行可解释的预测建模。由此产生的方法将能够发现未观察到的或隐藏的状态之间的转换模式，解释结果中的突然中断。他们将能够通过学习数据所表现出的潜在复相关结构来解释他们的预测，不仅可以识别驱动预测的变量，还可以识别他们这样做的时间背景。该项目将用模拟纵向数据对所产生的方法进行严格的经验评估（具有不同的相关结构、不同的缺失机制、不同的随时间变化的变量重要性）、若干基准纵向数据集，并且，最重要的是，去识别的纵向电子健康记录数据和来自真实世界健康护理应用的社会人口统计数据该奖项反映了NSF的法定使命，并通过使用基金会的知识价值和更广泛的影响审查标准进行评估，被认为值得支持。

项目成果

期刊论文数量（1）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

A Simple, Fast Algorithm for Continual Learning from High-Dimensional Data

一种简单、快速的高维数据持续学习算法

DOI：
发表时间：
2023
期刊：
Eleventh International Conference on Learning Representations
影响因子：
0
作者：
Ashtekar, Neil;Honavar, Vasant G
通讯作者：
Honavar, Vasant G

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Vasant Honavar其他文献

Neural network design and the complexity of learning, by J. Stephen Judd. Cambridge, MA: MIT Press, 1990

DOI：
10.1007/bf00993255
发表时间：
1992-06-01
期刊：
MACHINE LEARNING
影响因子：
2.900
作者：
Vasant Honavar
通讯作者：
Vasant Honavar

Machine-learning guided biophysical model development: application to ribosome catalysis

DOI：
10.1016/j.bpj.2021.11.2053
发表时间：
2022-02-11
期刊：
Conference abstract
影响因子：
作者：
Yang Jiang;Justin Petucci;Nishant Soni;Vasant Honavar;Edward O'Brien
通讯作者：
Edward O'Brien

Book Review:Neural Network Design and the Complexity of Learning, by J. Stephen Judd. Cambridge, MA: MIT Press, 1990

DOI：
10.1023/a:1022680813848
发表时间：
1992-06-01
期刊：
MACHINE LEARNING
影响因子：
2.900
作者：
Vasant Honavar
通讯作者：
Vasant Honavar

Exploring inconsistencies in genome-wide protein function annotations: a machine learning approach

DOI：
10.1186/1471-2105-8-284
发表时间：
2007-08-03
期刊：
BMC BIOINFORMATICS
影响因子：
3.300
作者：
Carson Andorf;Drena Dobbs;Vasant Honavar
通讯作者：
Vasant Honavar

A practical guide to machine learning interatomic potentials – Status and future

机器学习原子间势的实用指南——现状与未来

DOI：
10.1016/j.cossms.2025.101214
发表时间：
2025-03-01
期刊：
CURRENT OPINION IN SOLID STATE & MATERIALS SCIENCE
影响因子：
13.400
作者：
Ryan Jacobs;Dane Morgan;Siamak Attarian;Jun Meng;Chen Shen;Zhenghao Wu;Clare Yijia Xie;Julia H. Yang;Nongnuch Artrith;Ben Blaiszik;Gerbrand Ceder;Kamal Choudhary;Gabor Csanyi;Ekin Dogus Cubuk;Bowen Deng;Ralf Drautz;Xiang Fu;Jonathan Godwin;Vasant Honavar;Olexandr Isayev;Brandon M. Wood
通讯作者：
Brandon M. Wood