BIGDATA: F: DKA: Randomized methods for high-dimensional data analysis
BIGDATA:F:DKA:高维数据分析的随机方法
基本信息
- 批准号:1447471
- 负责人:
- 金额:$ 28.5万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2014
- 资助国家:美国
- 起止时间:2014-09-01 至 2019-08-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Randomized methods have recently proven highly useful in efficiently analyzing big data sets, and this project covers mathematically rigorous techniques for developing such algorithms to analyze and store such data efficiently. In particular this project focuses on furthering applications of recent randomized methods for large-scale computational linear algebra. Applications of this research include: randomized linear algebra, manifold learning, and model-based compressed sensing. Many of the developed technologies on problems in these areas are unified by the common tool of randomized "oblivious subspace embeddings."This research attacks the big data problem in randomized linear algebra, manifold learning, and model-based compressed sensing. In randomized linear algebra one imagines that the input is an extremely large matrix A, and the goal is to efficiently process this input, e.g., in the form of regression, principal component analysis, (approximate) matrix multiplication, eigenvalue estimation, k-means clustering, etc. First proposed by Sarlos was the idea of using "oblivious subspace embeddings" to speed up computation, i.e., picking a random matrix S (from an appropriate distribution) such that solving the problem on SA instead of A still yields an almost optimal solution to the original problem (where S is chosen so that SA has many fewer rows than A, thus compressing the massive data). This project develops novel methods to obtain more efficient such S, as well as to find new applications to kernelized and regularized regression problems.In manifold learning one imagines that the input data lies on a low-dimensional manifold in a high-dimensional space. For example, pixelated handwritten images can be viewed as high-dimensional vectors (indexed by pixels), whereas empirically it has been observed that such images tend to lie near a much lower dimensional manifold. By learning these parameters ("manifold learning"), one can do more efficient classifier training as well as achieve data compression. This project explores more efficient ways to use randomized methods to do manifold learning, e.g., by using efficient subspace embeddings. In model-based compressed sensing one wishes to acquire sparse signals with structured sparsity patterns efficiently using few linear measurements, for later (approximate) recovery. Organizing these measurements as the rows of a measurement matrix S, it is known that such S are closely connected to subspace embeddings. This project aims to explore this connection to obtain more efficient model-based compressed sensing and recovery algorithms.
随机方法最近被证明在有效分析大数据集方面非常有用,该项目涵盖了开发此类算法以有效分析和存储此类数据的严格数学技术。特别地,本项目侧重于进一步应用最近的随机化方法在大规模计算线性代数。本研究的应用包括:随机线性代数、流形学习和基于模型的压缩感知。在这些领域中,许多已开发的技术都被随机化的“遗忘子空间嵌入”这一通用工具所统一。本研究针对随机线性代数、流形学习和基于模型的压缩感知中的大数据问题。在随机化线性代数中,假设输入是一个极大的矩阵A,目标是有效地处理这个输入,例如,以回归、主成分分析、(近似)矩阵乘法、特征值估计、k均值聚类等形式。Sarlos首先提出了使用“无关子空间嵌入”来加快计算速度的想法,即(从适当的分布中)选择一个随机矩阵S,这样在SA而不是a上解决问题仍然会产生原始问题的几乎最优解(其中选择S使SA的行数比a少得多,从而压缩大量数据)。该项目开发了新的方法来获得更有效的S,以及找到核化和正则化回归问题的新应用。在流形学习中,人们想象输入数据位于高维空间中的低维流形上。例如,像素化的手写图像可以被视为高维向量(按像素索引),而经验上已经观察到,这样的图像往往位于低维流形附近。通过学习这些参数(“流形学习”),可以进行更有效的分类器训练并实现数据压缩。本项目探索更有效的方法来使用随机化方法进行流形学习,例如,通过使用有效的子空间嵌入。在基于模型的压缩感知中,人们希望使用很少的线性测量有效地获得具有结构化稀疏模式的稀疏信号,以便以后(近似)恢复。将这些测量组织为测量矩阵S的行,已知这样的S与子空间嵌入密切相关。本项目旨在探索这种联系,以获得更有效的基于模型的压缩感知和恢复算法。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Jelani Nelson其他文献
Sketching and streaming high-dimensional vectors
绘制和流式传输高维向量
- DOI:
- 发表时间:
2011 - 期刊:
- 影响因子:0
- 作者:
Jelani Nelson - 通讯作者:
Jelani Nelson
Lower Bounds for Oblivious Subspace Embeddings
不经意子空间嵌入的下界
- DOI:
- 发表时间:
2013 - 期刊:
- 影响因子:0
- 作者:
Jelani Nelson;Huy L. Nguyen - 通讯作者:
Huy L. Nguyen
Terminal Embeddings in Sublinear Time
亚线性时间的终端嵌入
- DOI:
- 发表时间:
2021 - 期刊:
- 影响因子:0
- 作者:
Yeshwanth Cherapanamjeri;Jelani Nelson - 通讯作者:
Jelani Nelson
Jelani Nelson的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Jelani Nelson', 18)}}的其他基金
Collaborative Research: AF: Medium: Sketching for privacy and privacy for sketching
合作研究:AF:中:为隐私而素描和为素描而隐私
- 批准号:
2311648 - 财政年份:2023
- 资助金额:
$ 28.5万 - 项目类别:
Continuing Grant
AF: Small: Collaborative Research: Dynamic data structures for vectors and graphs in sublinear memory
AF:小:协作研究:亚线性存储器中向量和图形的动态数据结构
- 批准号:
1908821 - 财政年份:2019
- 资助金额:
$ 28.5万 - 项目类别:
Standard Grant
AF: Small: Collaborative Research: Dynamic data structures for vectors and graphs in sublinear memory
AF:小:协作研究:亚线性存储器中向量和图形的动态数据结构
- 批准号:
1951384 - 财政年份:2019
- 资助金额:
$ 28.5万 - 项目类别:
Standard Grant
AF:Chaining methods and their applications to computer science
AF:链接方法及其在计算机科学中的应用
- 批准号:
1618373 - 财政年份:2016
- 资助金额:
$ 28.5万 - 项目类别:
Standard Grant
CAREER: Sketching Algorithms for Massive Data
职业:海量数据的草图算法
- 批准号:
1350670 - 财政年份:2014
- 资助金额:
$ 28.5万 - 项目类别:
Standard Grant
相似国自然基金
HIV-1逆转录酶/整合酶双重抑制剂DKA-DAPYs的分子设计、合成及抗HIV活性研究
- 批准号:21402148
- 批准年份:2014
- 资助金额:25.0 万元
- 项目类别:青年科学基金项目
相似海外基金
Planning and partnership development for a scalable intervention to prevent diabetic ketoacidosis (DKA) in children at diabetes diagnosis in Canada
规划和发展合作伙伴关系,以开展可扩展的干预措施,以预防加拿大糖尿病诊断儿童的糖尿病酮症酸中毒 (DKA)
- 批准号:
460080 - 财政年份:2022
- 资助金额:
$ 28.5万 - 项目类别:
Miscellaneous Programs
BIGDATA: F: DKA: Scalable, Private Algorithms for Continual Data Analysis
BIGDATA:F:DKA:用于持续数据分析的可扩展、私有算法
- 批准号:
1832766 - 财政年份:2017
- 资助金额:
$ 28.5万 - 项目类别:
Standard Grant
What is diabetic ketoacidosis (DKA)? - DiaBiteSize
什么是糖尿病酮症酸中毒(DKA)?
- 批准号:
374886 - 财政年份:2017
- 资助金额:
$ 28.5万 - 项目类别:
Salary Programs
BIGDATA: F: DKA: Collaborative Research: Randomized Numerical Linear Algebra (RandNLA) for multi-linear and non-linear data
BIGDATA:F:DKA:协作研究:用于多线性和非线性数据的随机数值线性代数 (RandNLA)
- 批准号:
1661760 - 财政年份:2016
- 资助金额:
$ 28.5万 - 项目类别:
Standard Grant
BIGDATA: F: DKA: Collaborative Research: High-Dimensional Statistical Machine Learning for Spatio-Temporal Climate Data
BIGDATA:F:DKA:协作研究:时空气候数据的高维统计机器学习
- 批准号:
1664720 - 财政年份:2016
- 资助金额:
$ 28.5万 - 项目类别:
Standard Grant
BIGDATA: F: DKA: Collaborative Research: Structured Nearest Neighbor Search in High Dimensions
BIGDATA:F:DKA:协作研究:高维结构化最近邻搜索
- 批准号:
1447473 - 财政年份:2015
- 资助金额:
$ 28.5万 - 项目类别:
Standard Grant
BIGDATA: F: DKA: Collaborative Research: Structured Nearest Neighbor Search in High Dimensions
BIGDATA:F:DKA:协作研究:高维结构化最近邻搜索
- 批准号:
1447413 - 财政年份:2015
- 资助金额:
$ 28.5万 - 项目类别:
Standard Grant
BIGDATA: F: DKA: Collaborative Research: Structured Nearest Neighbor Search in High Dimensions
BIGDATA:F:DKA:协作研究:高维结构化最近邻搜索
- 批准号:
1447476 - 财政年份:2015
- 资助金额:
$ 28.5万 - 项目类别:
Standard Grant
BIGDATA: F: DKA: Collaborative Research: Randomized Numerical Linear Algebra (RandNLA) for multi-linear and non-linear data
BIGDATA:F:DKA:协作研究:用于多线性和非线性数据的随机数值线性代数 (RandNLA)
- 批准号:
1447283 - 财政年份:2014
- 资助金额:
$ 28.5万 - 项目类别:
Standard Grant
BIGDATA: F: DKA: Usable Multiple Scale Big Data Analytics through Interactive Visualization
BIGDATA:F:DKA:通过交互式可视化进行可用的多尺度大数据分析
- 批准号:
1447416 - 财政年份:2014
- 资助金额:
$ 28.5万 - 项目类别:
Standard Grant














{{item.name}}会员




