Optimality Landscapes and Exploratory Data Analysis
最优性景观和探索性数据分析
基本信息
- 批准号:1310002
- 负责人:
- 金额:$ 27万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2013
- 资助国家:美国
- 起止时间:2013-08-01 至 2017-07-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
The investigators and their students study the development, implementation and application of iterative search procedures for unsupervised exploratory data analysis. In particular, they develop statistically principled procedures for discovering patterns in high dimensional data, including biclustering and correlation mining of genomic data, and community detection in complex networks arising in computational sociology and public policy. Complementing the methodological component of the research, the investigators and their students also study the development of general theoretical tools to analyze iterative data mining procedures, and the properties of their associated local optima. They develop probabilistic tools, including new variants of Stein's method for normal approximation and new Gaussian comparison theorems, to understand asymptotic properties of typical local optima, and the dependence of these optima under different assumptions on the underlying signal, beginning with the null setting in which only noise is present. Their research is carried out in the context of ongoing collaborations with UNC faculty in the Medical School, and in the Departments of Genetics, Public Policy, and Mathematics. The broad subject of the proposal is the development, theoretical analysis, and application of exploratory methods for large data sets. By exploratory methods, we mean those that search large data sets for significant patterns or configurations that may be of organizational or scientific interest. Examples include patterns that may distinguish types of a disease, that help target a drug or assess its efficacy, and patterns that identify among a large number of people a smaller community who frequently exchange text messages. In many cases, a numerical score is used to assess the potential importance of a pattern, and attention then turns to finding a pattern with a large score. Our primary interest is in search procedures that begin with a candidate pattern, then search for closely related patterns in the data that have higher score, repeating this procedure until they reach a pattern where no further (local) improvements are possible. Procedures of this sort are routinely applied in large data problems where finding the ``best'' pattern (the pattern with the largest score) is computationally prohibitive. We are developing and applying new, statistically based search procedures for several important tasks arising in the exploratory analysis of large data sets, including data mining and community detection. At the same time, we are developing fundamental theory to justify and inform the application of the iterative search procedures. Our work is being carried out in the context of ongoing collaborations with UNC faculty in the Medical School, and in the Departments of Genetics, Public Policy, and Mathematics.
研究者及其学生研究无监督探索性数据分析的迭代搜索程序的开发、实现和应用。特别是,他们开发了在高维数据中发现模式的统计原则程序,包括基因组数据的双聚类和关联挖掘,以及在计算社会学和公共政策中出现的复杂网络中的社区检测。为了补充研究的方法论成分,研究者和他们的学生还研究了一般理论工具的发展,以分析迭代数据挖掘过程,以及相关的局部最优解的性质。他们开发了概率工具,包括Stein的正态逼近方法的新变体和新的高斯比较定理,以理解典型局部最优解的渐近性质,以及这些最优解在不同假设下对潜在信号的依赖性,从只存在噪声的零设置开始。他们的研究是在与北卡罗来纳大学医学院的教师以及遗传学、公共政策和数学系的持续合作的背景下进行的。该提案的广泛主题是大型数据集的开发,理论分析和探索方法的应用。通过探索性方法,我们指的是那些搜索大型数据集以寻找可能具有组织或科学兴趣的重要模式或配置的方法。例子包括可能区分疾病类型的模式,有助于确定药物的目标或评估其疗效的模式,以及在大量人群中确定经常交换短信的较小社区的模式。在许多情况下,使用数值分数来评估模式的潜在重要性,然后将注意力转向寻找具有较大分数的模式。我们的主要兴趣是搜索过程,从候选模式开始,然后在数据中搜索具有较高分数的密切相关的模式,重复此过程,直到它们达到不可能进一步(局部)改进的模式。这类过程通常应用于大型数据问题,在这些问题中,寻找“最佳”模式(得分最高的模式)在计算上是令人望而却步的。我们正在开发和应用新的,基于统计的搜索程序,用于在大型数据集的探索性分析中出现的几个重要任务,包括数据挖掘和社区检测。与此同时,我们正在发展基本理论来证明和告知迭代搜索程序的应用。我们的工作是在与北卡罗来纳大学医学院的教师以及遗传学、公共政策和数学系的持续合作的背景下进行的。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Andrew Nobel其他文献
Andrew Nobel的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Andrew Nobel', 18)}}的其他基金
Inference for Stationary Processes: Optimal Transport and Generalized Bayesian Approaches
平稳过程的推理:最优传输和广义贝叶斯方法
- 批准号:
2113676 - 财政年份:2021
- 资助金额:
$ 27万 - 项目类别:
Standard Grant
Iterative testing procedures and high-dimensional scaling limits of extremal random structures
迭代测试程序和极值随机结构的高维缩放限制
- 批准号:
1613072 - 财政年份:2016
- 资助金额:
$ 27万 - 项目类别:
Continuing Grant
Significance Based Procedures for Mining and Prediction of Large Data Sets
基于显着性的大数据集挖掘和预测程序
- 批准号:
0907177 - 财政年份:2009
- 资助金额:
$ 27万 - 项目类别:
Standard Grant
Analysis of High Dimensional Data Using Subspace Clustering
使用子空间聚类分析高维数据
- 批准号:
0406361 - 财政年份:2004
- 资助金额:
$ 27万 - 项目类别:
Continuing Grant
Estimation from Dynamical Systems and Individual Sequences
动力系统和个体序列的估计
- 批准号:
9971964 - 财政年份:1999
- 资助金额:
$ 27万 - 项目类别:
Standard Grant
Mathematical Sciences: Greedy Growing and its Applications
数学科学:贪婪增长及其应用
- 批准号:
9501926 - 财政年份:1995
- 资助金额:
$ 27万 - 项目类别:
Continuing Grant
相似海外基金
Landscapes of Music: The more-than-human lives and politics of musical instruments
音乐景观:超越人类的生活和乐器的政治
- 批准号:
2889655 - 财政年份:2027
- 资助金额:
$ 27万 - 项目类别:
Studentship
Quantifying climate change impacts for wetlands in agricultural landscapes
量化气候变化对农业景观中湿地的影响
- 批准号:
DE240100477 - 财政年份:2024
- 资助金额:
$ 27万 - 项目类别:
Discovery Early Career Researcher Award
Sex-specific fitness landscapes in the evolution of egg-laying vs live-birth
产卵与活产进化中的性别特异性适应性景观
- 批准号:
NE/Y001672/1 - 财政年份:2024
- 资助金额:
$ 27万 - 项目类别:
Research Grant
Encouraging care for biodiversity through curated landscapes: community art and citizen science.
通过精心策划的景观鼓励对生物多样性的关注:社区艺术和公民科学。
- 批准号:
AH/Z505353/1 - 财政年份:2024
- 资助金额:
$ 27万 - 项目类别:
Research Grant
Restoring amphibian populations in chytrid-impacted landscapes
在受壶菌影响的地区恢复两栖动物种群
- 批准号:
DP240102056 - 财政年份:2024
- 资助金额:
$ 27万 - 项目类别:
Discovery Projects
Future Fashion Landscapes: Fostering biodiversity through collaborations between farmers, designers, and processors of native and rare breed wool
未来时尚景观:通过农民、设计师和本地及稀有品种羊毛加工商之间的合作促进生物多样性
- 批准号:
AH/Z505365/1 - 财政年份:2024
- 资助金额:
$ 27万 - 项目类别:
Research Grant
High-rise landscapes: The afterlives of tower block 'failure' and rethinking urban futures
高层景观:塔楼“失败”的后遗症和重新思考城市未来
- 批准号:
MR/Y003586/1 - 财政年份:2024
- 资助金额:
$ 27万 - 项目类别:
Fellowship
DISES Investigating mercury biogeochemical cycling via mixed-methods in complex artisanal gold mining landscapes and implications for community health
DISES 通过混合方法研究复杂手工金矿景观中的汞生物地球化学循环及其对社区健康的影响
- 批准号:
2307870 - 财政年份:2024
- 资助金额:
$ 27万 - 项目类别:
Standard Grant
Creating conservation landscapes that effectively safeguard biodiversity
创建有效保护生物多样性的保护景观
- 批准号:
FT230100402 - 财政年份:2024
- 资助金额:
$ 27万 - 项目类别:
ARC Future Fellowships
Biogenic Volatile Organic Compound (VOC) Emissions from Managed Landscapes and Their Contribution to Atmospheric Ozone and Aerosol
管理景观中的生物挥发性有机化合物 (VOC) 排放及其对大气臭氧和气溶胶的贡献
- 批准号:
2347370 - 财政年份:2024
- 资助金额:
$ 27万 - 项目类别:
Standard Grant