BIGDATA: F: Collaborative Research: Moment Methods for Big Data: Modern Theory, Algorithms, and Applications
BIGDATA:F:协作研究:大数据的矩方法:现代理论、算法和应用
基本信息
- 批准号:1837992
- 负责人:
- 金额:$ 100万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2018
- 资助国家:美国
- 起止时间:2018-10-01 至 2022-09-30
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Modern scientific disciplines are increasingly faced with datasets of ever larger size and complexity. Experimental observations may be marred by inaccurate measurements and missing values, and the sheer volume of the output of modern high-throughput experimental procedures in the life sciences makes data processing an increasing challenge. Drawing accurate scientific inferences from such data requires developing new tools that are both theoretically sound and computationally efficient. This project aims to develop statistical methodologies for uncovering the intrinsic structure in large, complex data. The planned methods have the potential to become the default data science techniques used in many scientific and engineering disciplines. Fast, user-friendly software will be made publicly available, both for general purpose big data analysis and specific scientific applications.The first pillar of the planned methodology is principal component analysis (PCA). The investigators are extending the use of PCA to the setting of high-dimensional observations with corrupted observations, non-Gaussian noise, and low signal-to-noise ratios. These kinds of datasets arise in problems such as cryo-electron microscopy and X-ray free electron laser imaging. This work will provide robust tools for exploratory data analysis for these problems. The second pillar of the research program is the method of moments, a classical technique for parameter estimation that the investigators have repurposed for new problems. The investigators will extend the range of applicability of the method of moments to many big data problems that exhibit certain algebraic structure. For these problems, the method of moments enables scalable and near-optimal statistical inference. Finally, the novel extensions of PCA and the method of moments will be combined to derive new near-optimal and scalable statistical inference procedures for high-dimensional problems.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
现代科学学科面临着越来越大、越来越复杂的数据集。实验观测可能会被不准确的测量和遗漏的值所破坏,而现代生命科学中高通量实验程序的巨大输出量使数据处理成为越来越大的挑战。从这样的数据中得出准确的科学推断,需要开发出理论上可靠、计算效率高的新工具。该项目旨在开发统计方法,以揭示大型复杂数据的内在结构。计划中的方法有可能成为许多科学和工程学科中使用的默认数据科学技术。快速、用户友好的软件将公开提供,既用于一般用途的大数据分析,也用于特定的科学应用。计划中的方法的第一个支柱是主成分分析(PCA)。研究人员正在将主成分分析的使用扩展到高维观测的环境中,这些高维观测具有损坏的观测、非高斯噪声和低信噪比。这类数据集出现在低温电子显微镜和X射线自由电子激光成像等问题中。这项工作将为这些问题的探索性数据分析提供强大的工具。研究计划的第二个支柱是矩方法,这是一种经典的参数估计技术,研究人员针对新问题改变了它的用途。研究人员将把矩方法的适用范围扩大到许多表现出一定代数结构的大数据问题。对于这些问题,矩方法实现了可伸缩和接近最优的统计推断。最后,主成分分析的新扩展和矩方法将结合起来,为高维问题推导出新的近最佳和可扩展的统计推理程序。这一奖项反映了NSF的法定使命,并通过使用基金会的智力优势和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(33)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Multi-target Detection with an Arbitrary Spacing Distribution.
- DOI:10.1109/tsp.2020.2975943
- 发表时间:2020
- 期刊:
- 影响因子:0
- 作者:Lan TY;Bendory T;Boumal N;Singer A
- 通讯作者:Singer A
How to reduce dimension with PCA and random projections?
- DOI:10.1109/tit.2021.3112821
- 发表时间:2021-12
- 期刊:
- 影响因子:2.5
- 作者:Yang, Fan;Liu, Sifan;Dobriban, Edgar;Woodruff, David P.
- 通讯作者:Woodruff, David P.
Optimal Iterative Sketching Methods with the Subsampled Randomized Hadamard Transform
子采样随机哈达玛变换的最优迭代草图方法
- DOI:
- 发表时间:2020
- 期刊:
- 影响因子:0
- 作者:Lacotte, Jonathan;Liu, Sifan;Dobriban, Edgar;Pilanci, Mert
- 通讯作者:Pilanci, Mert
PERMUTATION METHODS FOR FACTOR ANALYSIS AND PCA
- DOI:10.1214/19-aos1907
- 发表时间:2020-10-01
- 期刊:
- 影响因子:4.5
- 作者:Dobriban, Edgar
- 通讯作者:Dobriban, Edgar
Optimal prediction in the linearly transformed spiked model
- DOI:10.1214/19-aos1819
- 发表时间:2017-09
- 期刊:
- 影响因子:0
- 作者:Edgar Dobriban-;W. Leeb;A. Singer
- 通讯作者:Edgar Dobriban-;W. Leeb;A. Singer
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Amit Singer其他文献
Integrating NOE and RDC using sum-of-squares relaxation for protein structure determination
使用平方和松弛积分 NOE 和 RDC 进行蛋白质结构测定
- DOI:
- 发表时间:
2016 - 期刊:
- 影响因子:2.7
- 作者:
Y. Khoo;Y. Khoo;Amit Singer;David Cowburn - 通讯作者:
David Cowburn
Alignment of density maps in Wasserstein distance
以 Wasserstein 距离对齐密度图
- DOI:
- 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
Amit Singer;Ruiyi Yang - 通讯作者:
Ruiyi Yang
Moment-based metrics for molecules computable from cryogenic electron microscopy images
可从低温电子显微镜图像计算的基于矩的分子度量
- DOI:
10.1017/s2633903x24000023 - 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
Andy Zhang;Oscar Mickelin;J. Kileel;Eric J. Verbeke;Nicholas F. Marshall;M. A. Gilles;Amit Singer - 通讯作者:
Amit Singer
Amit Singer的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Amit Singer', 18)}}的其他基金
NSF-BSF: Modern Techniques for Signal Reconstruction from Moments
NSF-BSF:从时刻重建信号的现代技术
- 批准号:
2009753 - 财政年份:2020
- 资助金额:
$ 100万 - 项目类别:
Continuing Grant
相似海外基金
BIGDATA: IA: Collaborative Research: Asynchronous Distributed Machine Learning Framework for Multi-Site Collaborative Brain Big Data Mining
BIGDATA:IA:协作研究:用于多站点协作大脑大数据挖掘的异步分布式机器学习框架
- 批准号:
2348159 - 财政年份:2023
- 资助金额:
$ 100万 - 项目类别:
Standard Grant
BIGDATA: IA: Collaborative Research: Intelligent Solutions for Navigating Big Data from the Arctic and Antarctic
BIGDATA:IA:协作研究:导航北极和南极大数据的智能解决方案
- 批准号:
2308649 - 财政年份:2022
- 资助金额:
$ 100万 - 项目类别:
Standard Grant
BIGDATA: Collaborative Research: F: Holistic Optimization of Data-Driven Applications
BIGDATA:协作研究:F:数据驱动应用程序的整体优化
- 批准号:
2027516 - 财政年份:2020
- 资助金额:
$ 100万 - 项目类别:
Standard Grant
BIGDATA: F: Collaborative Research: Practical Analysis of Large-Scale Data with Lyme Disease Case Study
BIGDATA:F:协作研究:莱姆病案例研究大规模数据的实际分析
- 批准号:
1934319 - 财政年份:2019
- 资助金额:
$ 100万 - 项目类别:
Standard Grant
BIGDATA: IA: Collaborative Research: Protecting Yourself from Wildfire Smoke: Big Data-Driven Adaptive Air Quality Prediction Methodologies
大数据:IA:协作研究:保护自己免受野火烟雾的侵害:大数据驱动的自适应空气质量预测方法
- 批准号:
1838022 - 财政年份:2019
- 资助金额:
$ 100万 - 项目类别:
Standard Grant
BIGDATA: F: Collaborative Research: Foundations of Responsible Data Management
大数据:F:协作研究:负责任的数据管理的基础
- 批准号:
1926250 - 财政年份:2019
- 资助金额:
$ 100万 - 项目类别:
Standard Grant
BIGDATA: IA: Collaborative Research: Intelligent Solutions for Navigating Big Data from the Arctic and Antarctic
BIGDATA:IA:协作研究:导航北极和南极大数据的智能解决方案
- 批准号:
1947584 - 财政年份:2019
- 资助金额:
$ 100万 - 项目类别:
Standard Grant
BIGDATA: IA: Collaborative Research: Asynchronous Distributed Machine Learning Framework for Multi-Site Collaborative Brain Big Data Mining
BIGDATA:IA:协作研究:用于多站点协作大脑大数据挖掘的异步分布式机器学习框架
- 批准号:
1837964 - 财政年份:2019
- 资助金额:
$ 100万 - 项目类别:
Standard Grant
BIGDATA: F: Collaborative Research: Optimizing Log-Structured-Merge-Based Big Data Management Systems
BIGDATA:F:协作研究:优化基于日志结构合并的大数据管理系统
- 批准号:
1838222 - 财政年份:2019
- 资助金额:
$ 100万 - 项目类别:
Standard Grant
BIGDATA: F: Collaborative Research: Optimizing Log-Structured-Merge-Based Big Data Management Systems
BIGDATA:F:协作研究:优化基于日志结构合并的大数据管理系统
- 批准号:
1838248 - 财政年份:2019
- 资助金额:
$ 100万 - 项目类别:
Standard Grant