Rare and Weak Signals in Big Data: How to Find Them and How to Use Them
大数据中的稀有信号和微弱信号:如何找到它们以及如何使用它们
基本信息
- 批准号:1208315
- 负责人:
- 金额:$ 10万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2012
- 资助国家:美国
- 起止时间:2012-08-01 至 2017-07-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
A research effort is proposed to create new tools for high dimensional data analysis, focusing on the very challenging regime where signals are both rare and weak. In particular, the proposer proposes to: (a). Develop graphlet screening as a new tool for high dimensional variable selection, introduce a new theoretic framework for assessing the optimality of variable selection, and show that graphlet screening achieves the optimal rate of convergence in terms of Hamming distance of the selection errors. (b). Develop a new method of spectral clustering by using the recent idea of Higher Criticism thresholding, and investigates the fundamental limits for several problems related to low-rank matrix recovery, including high dimensional clustering, sparse Principle Component Analysis, and a testing problem related to the underlying large-size covariance matrix. (c) Extend and apply the proposed methods and theory to the analysis of Big data generated in various scientific fields, including genomics and machine learning. We are often said that we are entering the era of 'Big Data', where massive datasets consisting of millions of observations are mined for associations and patterns. What is never said about this pervasive trend is that, unfortunately, the signal we are looking for is usually very rare and weak and is hard to find, and it is easy to be fooled. The project introduces new ideas, new tools, and novel theory that are appropriate for rare and weak signals in Big Data, and apply the theory and methods to various scientific fields, including genomics and machine learning.
提出了一项研究工作,以创建用于高维数据分析的新工具,重点关注信号既罕见又微弱的极具挑战性的状态。特别地,提议者建议:(a)。将石墨烯筛选作为高维变量选择的新工具,引入了评估变量选择最优性的新理论框架,并表明石墨烯筛选在选择误差的汉明距离方面达到了最优收敛速度。(b)。利用最新的高批评阈值思想,开发了一种新的光谱聚类方法,并研究了与低秩矩阵恢复相关的几个问题的基本限制,包括高维聚类,稀疏主成分分析,以及与底层大尺寸协方差矩阵相关的测试问题。(c)将提出的方法和理论扩展并应用于各种科学领域产生的大数据分析,包括基因组学和机器学习。我们常说,我们正在进入“大数据”时代,在这个时代,由数百万个观察结果组成的海量数据集被挖掘出关联和模式。不幸的是,对于这种普遍的趋势,我们从未说过的是,我们正在寻找的信号通常非常罕见和微弱,很难发现,而且很容易被愚弄。该项目引入了适合大数据中罕见和微弱信号的新思路、新工具和新理论,并将理论和方法应用于基因组学和机器学习等各个科学领域。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Jiashun Jin其他文献
SCORE+ for Network Community Detection
网络社区检测 SCORE
- DOI:
- 发表时间:
2018 - 期刊:
- 影响因子:0
- 作者:
Jiashun Jin;Z. Ke;Shengming Luo - 通讯作者:
Shengming Luo
Supplement of ``Estimating Network Memberships by Simplex Vertex Hunting"
《通过单纯形顶点狩猎估计网络成员资格》的补充
- DOI:
- 发表时间:
2019 - 期刊:
- 影响因子:0
- 作者:
Jiashun Jin;Z. Ke;Shengming Luo - 通讯作者:
Shengming Luo
MEDLINE/ PubMed
MEDLINE/PubMed
- DOI:
10.1007/978-0-387-39940-9_3039 - 发表时间:
2004 - 期刊:
- 影响因子:3.8
- 作者:
Cornelia Caragea;V. Honavar;P. Boncz;P. Larson;S. Dietrich;Gonzalo Navarro;Bhavani Thuraisingham;Yan Luo;Ouri E. Wolfson;S. Beitzel;Eric C. Jensen;Ophir Frieder;Christian S. Jensen;N. Tradisauskas;Ethan V. Munson;A. Wun;K. Goda;Stephen E. Fienberg;Jiashun Jin;Guimei Liu;Nick Craswell;T. Pedersen;Cesare Pautasso;M. Moro;S. Manegold;B. Carminati;Marina Blanton;Sara Bouchenak;Noël de Palma;Wei Tang;Christoph Quix;M. Jeusfeld;R. K. Pon;David J. Buttler;W. Meng;P. Zezula;Michal Batko;Vlastislav Dohnal;J. Domingo;Denilson Barbosa;Ioana Manolescu;Jeffrey Xu Yu;Emmanuel Cecchet;Vivien Quéma;Xifeng Yan;G. Santucci;D. Zeinalipour;Panos K. Chrysanthis;Amol Deshpande;Carlos Guestrin;Samuel Madden;Carson Kai;R. H. Güting;Amarnath Gupta;Heng Tao Shen;G. Weikum;Ramesh Jain;Jeffrey Xu Yu;Paolo Ciaccia;K. Candan;M. Sapino;C. Meghini;F. Sebastiani;U. Straccia;F. Nack;V. S. Subrahmanian;Maria Vanina Martinez;D. Reforgiato;T. Westerveld;M. Sebillo;G. Vitiello;Maria De Marsico;K. Voruganti;C. Parent;S. Spaccapietra;Christelle Vangenot;Esteban Zimányi;Prasan Roy;S. Sudarshan;E. Puppo;Peer Kröger;Matthias Renz;H. Schuldt;Solmaz Kolahi;A. Unwin;W. Cellary - 通讯作者:
W. Cellary
Estimation and Confidence Sets for Sparse Normal Mixtures
稀疏正态混合物的估计和置信集
- DOI:
10.1214/009053607000000334 - 发表时间:
2006 - 期刊:
- 影响因子:4.5
- 作者:
T. Cai;Jiashun Jin;Mark G. Low - 通讯作者:
Mark G. Low
Privacy-Preserving Data Sharing in High Dimensional Regression and Classification Settings
高维回归和分类设置中的隐私保护数据共享
- DOI:
10.29012/jpc.v4i1.618 - 发表时间:
2012 - 期刊:
- 影响因子:0
- 作者:
S. Fienberg;Jiashun Jin - 通讯作者:
Jiashun Jin
Jiashun Jin的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Jiashun Jin', 18)}}的其他基金
Feature selection in several challenging directions
几个具有挑战性的方向的特征选择
- 批准号:
2310668 - 财政年份:2023
- 资助金额:
$ 10万 - 项目类别:
Standard Grant
New Tools for Analyzing Complex Network and Text Data
用于分析复杂网络和文本数据的新工具
- 批准号:
2015469 - 财政年份:2020
- 资助金额:
$ 10万 - 项目类别:
Standard Grant
New Tools for Large-Scale Sparse Inference
用于大规模稀疏推理的新工具
- 批准号:
1513414 - 财政年份:2015
- 资助金额:
$ 10万 - 项目类别:
Continuing Grant
CAREER: Inferences on Large-Scale Multiple Comparisons: The Temptation of the Fourier Kingdom
职业:大规模多重比较的推论:傅里叶王国的诱惑
- 批准号:
0908613 - 财政年份:2008
- 资助金额:
$ 10万 - 项目类别:
Continuing Grant
CAREER: Inferences on Large-Scale Multiple Comparisons: The Temptation of the Fourier Kingdom
职业:大规模多重比较的推论:傅里叶王国的诱惑
- 批准号:
0639980 - 财政年份:2007
- 资助金额:
$ 10万 - 项目类别:
Continuing Grant
New Tools for Sparse Inference in Large-scale Multiple Comparisons
大规模多重比较中稀疏推理的新工具
- 批准号:
0505423 - 财政年份:2005
- 资助金额:
$ 10万 - 项目类别:
Standard Grant
相似国自然基金
磁转动超新星爆发中weak r-process的关键核反应
- 批准号:12375145
- 批准年份:2023
- 资助金额:52.00 万元
- 项目类别:面上项目
相似海外基金
高含水率坑井の水生産抑制に向けたWeakゲル開発と広域流体制御技術の向上
开发弱凝胶并改进广域流体控制技术以抑制高含水井产水
- 批准号:
24K08317 - 财政年份:2024
- 资助金额:
$ 10万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Weak notions of curvature-dimension conditions on step-two Carnot groups
二级卡诺群上曲率维数条件的弱概念
- 批准号:
24K16928 - 财政年份:2024
- 资助金额:
$ 10万 - 项目类别:
Grant-in-Aid for Early-Career Scientists
Organization of transcriptional machinery by weak multivalent interactions
通过弱多价相互作用组织转录机制
- 批准号:
10758297 - 财政年份:2023
- 资助金额:
$ 10万 - 项目类别:
Weak Solutions and Turbulence in Fluid Dynamics
流体动力学中的弱解和湍流
- 批准号:
2346799 - 财政年份:2023
- 资助金额:
$ 10万 - 项目类别:
Standard Grant
CAREER: Towards Open World Event Knowledge Extraction with Weak Supervision
职业:在弱监督下实现开放世界事件知识提取
- 批准号:
2238940 - 财政年份:2023
- 资助金额:
$ 10万 - 项目类别:
Continuing Grant
A Graph-based Methodology for Modeling the Nucleation of Weak Electrolytes
基于图形的弱电解质成核建模方法
- 批准号:
2317787 - 财政年份:2023
- 资助金额:
$ 10万 - 项目类别:
Continuing Grant
Bottom-up Approach for Improving Systems with weak power systems networks and gradual integration of off-grid community systems
自下而上的弱电力系统网络改进方法和离网社区系统的逐步整合
- 批准号:
2891694 - 财政年份:2023
- 资助金额:
$ 10万 - 项目类别:
Studentship
Creation of a CWO Control Method for Swarm Robots Based on "Weak Constraints"
基于“弱约束”的群体机器人CWO控制方法的创建
- 批准号:
23KJ1445 - 财政年份:2023
- 资助金额:
$ 10万 - 项目类别:
Grant-in-Aid for JSPS Fellows
Improving organisational outcomes through weak ties: an investigation using behaviour settings
通过弱关系改善组织成果:使用行为设置的调查
- 批准号:
2887051 - 财政年份:2023
- 资助金额:
$ 10万 - 项目类别:
Studentship