Collaborative Research: Use of Random Compression Matrices For Scalable Inference in High Dimensional Structured Regressions
合作研究:使用随机压缩矩阵进行高维结构化回归中的可扩展推理
基本信息
- 批准号:2210206
- 负责人:
- 金额:$ 11万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2022
- 资助国家:美国
- 起止时间:2022-06-15 至 2025-05-31
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
As the scientific community moves into a data-driven era, there is an unprecedented opportunity to leverage large scale imaging, genetic and EHR data to better characterize and understand human disease to improve treatment and prognosis. Consequently, analysis of such datasets with flexible statistical models has become an enormously active area of research over the last decade. To this end, this project plans to develop a completely new class of methods, which are based on the idea of fitting statistical models on datasets obtained by compressing big data using a well designed mechanism. The development enables efficient modeling of massive data on an unprecedented scale. While the motivation of the investigators comes primarily from complex modeling and uncertainty quantification of massive biomedical data, the statistical methods are general enough to set important footprints in the related literature of machine learning and environmental sciences. The overarching goal also includes the development of software toolkits to better serve practitioners in related disciplines. Further, the projects will provide first hand training opportunities for graduate and undergraduate students, including female and students from minority communities, in state-of-the-art statistical methodologies and imaging/genetic/EHR data. By disseminating the outcome of the project among high school students in terminology that they can understand, the project can have far reaching effects to enhance public scientific literacy about statistics.Two crucial aspects of modern statistical learning approaches in the era of complex and high dimensional data are accuracy and scale in inference. Modern data are increasingly complex and high dimensional, involving a large number of variables and large sample size, with complex relationships between different variables. Developing practically efficient (in terms of storage and analysis) and theoretically “optimal” Bayesian high dimensional parametric or nonparametric regression methods to draw accurate inference with valid uncertainties from such complex datasets is an extremely important problem. To offer a general solution for this problem, the investigators will develop approaches based on data compression using a small number of random linear transformations. The approach either reduces a large number of records corresponding to each variable using compression, in which case it maintains feature interpretation for adequate inference, or, reduces the dimension of the covariate vector for each sample using compression, in which case the focus is only on prediction of the response. In either case, data compression facilitates drawing storage efficient, scalable and accurate Bayesian inference/prediction in presence of high dimensional data with sufficiently rich parametric and nonparametric regression models. An important goal is to establish precise theoretical results on the convergence behavior of the fitted models with compressed data as a function of the number of predictors, sample size, properties of random linear transformations and features of these models. The approaches will be used to study neurological disorders by combining brain imaging data, genetic data and electronic health records (EHR) data from the UK Biobank database. The project will also contribute on a broader front to advancing the interdisciplinary research training and broadening participation in statistical sciences.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
随着科学界进入数据驱动的时代,利用大规模成像、遗传和电子病历数据来更好地表征和了解人类疾病以改善治疗和预后的机会前所未有。因此,在过去十年中,用灵活的统计模型分析这些数据集已成为一个非常活跃的研究领域。为此,该项目计划开发一种全新的方法,该方法基于使用设计良好的机制压缩大数据获得的数据集上拟合统计模型的思想。该开发能够以前所未有的规模对大量数据进行高效建模。虽然研究人员的动机主要来自于对大量生物医学数据的复杂建模和不确定性量化,但统计方法的普遍性足以在机器学习和环境科学的相关文献中留下重要的足迹。总体目标还包括开发软件工具包,以便更好地为相关学科的从业者服务。此外,这些项目将为研究生和本科生,包括女性和少数民族社区的学生提供第一手培训机会,了解最先进的统计方法和成像/遗传/电子病历数据。通过将项目成果以学生能够理解的术语传播给他们,该项目将对提高公众对统计的科学素养产生深远的影响。在复杂和高维数据时代,现代统计学习方法的两个关键方面是准确性和推理规模。现代数据日益复杂化、高维化,涉及的变量数量多、样本量大,不同变量之间的关系复杂。开发实际有效(在存储和分析方面)和理论上“最优”的贝叶斯高维参数或非参数回归方法,从如此复杂的数据集中得出具有有效不确定性的准确推断是一个极其重要的问题。为了提供这个问题的一般解决方案,研究人员将开发基于使用少量随机线性变换的数据压缩的方法。该方法要么使用压缩减少与每个变量对应的大量记录,在这种情况下,它保持特征解释以进行充分的推理,要么使用压缩减少每个样本的协变量向量的维度,在这种情况下,重点只放在响应的预测上。在任何一种情况下,数据压缩都有助于在具有足够丰富的参数和非参数回归模型的高维数据中进行高效,可扩展和准确的贝叶斯推理/预测。一个重要的目标是建立关于压缩数据的拟合模型的收敛行为的精确理论结果,作为预测器数量、样本量、随机线性变换的性质和这些模型的特征的函数。这些方法将通过结合来自英国生物银行数据库的脑成像数据、遗传数据和电子健康记录(EHR)数据,用于研究神经系统疾病。该项目还将在更广泛的方面促进跨学科研究培训和扩大统计科学的参与。该奖项反映了美国国家科学基金会的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Aaron Scheffler其他文献
Sex-based differences in biomechanical function for chronic low back pain and how it relates to pain experience
- DOI:
10.1007/s00586-025-08730-2 - 发表时间:
2025-03-20 - 期刊:
- 影响因子:2.700
- 作者:
Erin Archibeck;Irina Strigo;Aaron Scheffler;Abel Torres-Espin;Karim Khattab;Pavlos Silvestros;Robert Matthew;Caitlin Regan;Paul Hodges;Conor O’Neill;Jeffrey Lotz;Grace O’Connell;Jeannie Bailey - 通讯作者:
Jeannie Bailey
Decoding Pain Chronicity in Electronic Health Records: Feasibility of Automated Annotation of Pain Chronicity in Chronic Low Back Pain Patients
电子健康记录中慢性疼痛的解码:慢性下腰痛患者慢性疼痛自动标注的可行性
- DOI:
10.1016/j.jpain.2024.01.233 - 发表时间:
2024-04-01 - 期刊:
- 影响因子:4.000
- 作者:
Simran A. Kanal;Jeannie F. Bailey;Jeffery Lotz;Aaron Scheffler;Thomas A. Peterson - 通讯作者:
Thomas A. Peterson
Awareness and utilization of genetic testing for hereditary cancers in cancer survivors: a cross-sectional 2021 HINTS-SEER study
- DOI:
10.1007/s11764-025-01823-3 - 发表时间:
2025-05-22 - 期刊:
- 影响因子:2.900
- 作者:
Kirithiga Ramalingam;Stephen Li;Meg McKinley;Robin C. Vanderpool;Sarah H. Nash;Salma Shariff-Marco;Aaron Scheffler;Erin L. Van Blarigan;Mindy C. DeRouen - 通讯作者:
Mindy C. DeRouen
Network anatomy in logopenic variant of primary progressive aphasia
原发性进行性失语症的网络解剖学
- DOI:
10.1101/2023.05.15.23289065 - 发表时间:
2023 - 期刊:
- 影响因子:0
- 作者:
M. Mandelli;D. L. Lorca;S. Lukic;M. Montembeault;Andrea Gajardo;Abigail E. Licata;Aaron Scheffler;Giovanni Battistella;S. Grasso;Rian Bogley;Buddhika M Ratnasiri;R. La Joie;Nidhi Mundada;E. Europa;G. Rabinovici;Bruce L. Miller;Jessica de Leon;M. Henry;Z. Miller;M. Gorno - 通讯作者:
M. Gorno
InVA: Integrative Variational Autoencoder for Harmonization of Multi-modal Neuroimaging Data
InVA:用于协调多模态神经影像数据的综合变分自动编码器
- DOI:
- 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
Bowen Lei;Rajarshi Guhaniyogi;Krishnendu Chandra;Aaron Scheffler;Bani Mallick - 通讯作者:
Bani Mallick
Aaron Scheffler的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
相似国自然基金
Research on Quantum Field Theory without a Lagrangian Description
- 批准号:24ZR1403900
- 批准年份:2024
- 资助金额:0.0 万元
- 项目类别:省市级项目
Cell Research
- 批准号:31224802
- 批准年份:2012
- 资助金额:24.0 万元
- 项目类别:专项基金项目
Cell Research
- 批准号:31024804
- 批准年份:2010
- 资助金额:24.0 万元
- 项目类别:专项基金项目
Cell Research (细胞研究)
- 批准号:30824808
- 批准年份:2008
- 资助金额:24.0 万元
- 项目类别:专项基金项目
Research on the Rapid Growth Mechanism of KDP Crystal
- 批准号:10774081
- 批准年份:2007
- 资助金额:45.0 万元
- 项目类别:面上项目
相似海外基金
Collaborative Research: NCS-FR: DEJA-VU: Design of Joint 3D Solid-State Learning Machines for Various Cognitive Use-Cases
合作研究:NCS-FR:DEJA-VU:针对各种认知用例的联合 3D 固态学习机设计
- 批准号:
2319619 - 财政年份:2023
- 资助金额:
$ 11万 - 项目类别:
Continuing Grant
Collaborative Research: BoCP-Design US-Sao Paulo: Land use change, ecosystem resilience and zoonotic spillover risk
合作研究:BoCP-Design US-Sao Paulo:土地利用变化、生态系统恢复力和人畜共患病溢出风险
- 批准号:
2225023 - 财政年份:2023
- 资助金额:
$ 11万 - 项目类别:
Standard Grant
Collaborative Research: BoCP-Design US-Sao Paulo: Land use change, ecosystem resilience and zoonotic spillover risk
合作研究:BoCP-Design US-Sao Paulo:土地利用变化、生态系统恢复力和人畜共患病溢出风险
- 批准号:
2225022 - 财政年份:2023
- 资助金额:
$ 11万 - 项目类别:
Standard Grant
Collaborative Research: CAS-Climate: Linking Activities, Expenditures and Energy Use into an Integrated Systems Model to Understand and Predict Energy Futures
合作研究:CAS-气候:将活动、支出和能源使用连接到集成系统模型中,以了解和预测能源未来
- 批准号:
2243099 - 财政年份:2023
- 资助金额:
$ 11万 - 项目类别:
Standard Grant
Collaborative Research: BoCP-Design: US-South Africa: Turning CO2 to stone: the ecosystem service of the oxalate-carbonate pathway and its sensitivity to land use change
合作研究:BoCP-设计:美国-南非:将二氧化碳转化为石头:草酸盐-碳酸盐途径的生态系统服务及其对土地利用变化的敏感性
- 批准号:
2224994 - 财政年份:2023
- 资助金额:
$ 11万 - 项目类别:
Standard Grant
Collaborative Research: MUCUS: Measuring and Understanding the Cassiopea Use of Space
合作研究:MUCUS:测量和理解仙后座对空间的利用
- 批准号:
2227068 - 财政年份:2023
- 资助金额:
$ 11万 - 项目类别:
Standard Grant
Collaborative Research: PPoSS: LARGE: Research into the Use and iNtegration of Data Movement Accelerators (RUN-DMX)
协作研究:PPoSS:大型:数据移动加速器 (RUN-DMX) 的使用和集成研究
- 批准号:
2316176 - 财政年份:2023
- 资助金额:
$ 11万 - 项目类别:
Continuing Grant
Collaborative Research: RUI: Trust but Verify: The Use of Intuition in Engineering Problem Solving
合作研究:RUI:信任但验证:直觉在工程问题解决中的运用
- 批准号:
2325524 - 财政年份:2023
- 资助金额:
$ 11万 - 项目类别:
Standard Grant
Collaborative Research: CyberTraining: Pilot: Building a strong community of computational researchers empowered in the use of novel cutting-edge technologies
协作研究:网络培训:试点:建立一个强大的计算研究人员社区,有权使用新颖的尖端技术
- 批准号:
2320990 - 财政年份:2023
- 资助金额:
$ 11万 - 项目类别:
Standard Grant
Collaborative Research: GEO OSE Track 2: Sustainable Open Science Tools to Democratize Use of 3D Geomaterial Data
合作研究:GEO OSE 第 2 轨:可持续开放科学工具使 3D 岩土材料数据的使用民主化
- 批准号:
2324786 - 财政年份:2023
- 资助金额:
$ 11万 - 项目类别:
Standard Grant