Distance-Based Analysis for Complex High-Dimensional Data

复杂高维数据的基于距离的分析

基本信息

  • 批准号:
    2113771
  • 负责人:
  • 金额:
    $ 30万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2021
  • 资助国家:
    美国
  • 起止时间:
    2021-07-01 至 2022-02-28
  • 项目状态:
    已结题

项目摘要

Throughout the course of the twentieth century, distances have played a significant role in important areas of statistics, which include classification, clustering, discriminant analysis, multidimensional scaling, sampling, spatial statistics, scoring rules, and kernel methods in machine learning. Distances are also central to the definition of divergence measures, relative entropy and information gain, some of which are fundamental to the concept of C.R. Rao's quadratic entropy and the analysis of diversity in ecology and other areas of science. Yet, at present there are significant gaps in our knowledge and in the emerging statistical literature on the use of distance-based tests and analyses for complex high dimensional data. One example is analysis of similarity which is among the most cited and most widely used distance-based statistical methods but is limited by an absence of relevant mathematical knowledge. This research will derive new mathematical knowledge on various distance-based statistical methods, and apply this for providing answers to important scientific questions arising in a number of disciplines in forestry, ecology and marine science, such as: (1) how biodiversity changes in tropical forests? (2) how taxonomic and functional profiles of bacterial communities change with environmental conditions in different oceanic regions? The project establishes collaborations among several disciplines and between two US academic institutions and provides research and training to graduate and undergraduate students. The project develops a new body of knowledge on distance-based statistical methods and computation for analyzing complex, high dimensional data that arise in the form of compositions, trees, graphs, or networks. The distances considered here are all non-Euclidean -- either non-metric dissimilarities that do not satisfy any triangular inequalities or just discrete numbers -- but they all arise from conditionally positive definite kernels. Examples of distances include the squared Euclidean distance, the Bray-Curtis dissimilarity, the Jensen-Shannon distance, Unifrac or the Kantorovich-Rubinstein metric, the Aitchison distance, the edit distance, various graph kernel and spectral distances, and other distances based on optimal transport problems. Specifically, the project advances the mathematical theory and computation of exact distribution-free two and multi-sample runs tests, change points, and other related problems by counting runs along the shortest Hamiltonian path (or loop) of the pooled sample of data points. The project also considers analysis of similarity and related distance-based rank tests and derives new mathematical results that allow us to pursue more advanced statistical analyses. The project contributes to: (i) a deeper analysis of biodiversity in tropical forest; (ii) an investigation of how taxonomic and functional profiles of prokaryotic communities change with environmental conditions in different oceanic regions; (iii) a study of the variability of composition of rare earth elements in deep-sea muds of the Pacific Ocean; and (iv) an understanding of the relationship of intertidal communities in the Oregon coast with respect to upwelling and nutrient delivery. The project integrates mathematics research, science and education and will provide opportunities for dissertation work for graduate students.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
在整个二十世纪的整个过程中,距离在统计的重要领域中发挥了重要作用,包括分类,聚类,判别分析,多维缩放,抽样,空间统计数据,评分规则,评分规则和机器学习中的内核方法。距离也是差异度量,相对熵和信息增益的定义的核心,其中一些是C.R. Rao的二次熵的概念以及对生态学和其他科学领域多样性的分析的基础。然而,目前,我们的知识以及有关使用基于距离的测试和分析复杂高维数据的新兴统计文献存在很大的差距。 一个例子是对相似性的分析,这是最被引用和最广泛使用的基于距离的统计方法之一,但受到相关数学知识的限制。这项研究将获得有关各种基于距离的统计方法的新数学知识,并将其应用于在林业,生态学和海洋科学的许多学科中引起的重要科学问题的答案,例如:(1)热带森林中的生物多样性如何变化? (2)细菌群落的分类学和功能特征如何随不同海洋地区的环境条件而变化?该项目在几个学科和两个美国学术机构之间建立了合作,并为研究生和本科生提供了研究和培训。该项目开发了有关基于距离的统计方法和计算的新知识,用于分析以组成,树,图形或网络形式出现的复杂,高维数据。这里考虑的距离都是非欧国人 - 要么不满足任何三角形不平等的非金属差异,要么仅仅是离散数字 - 但它们都是由有条件的积极确定的核引起的。距离的示例包括平方的欧几里得距离,Bray-Curtis差异,Jensen-Shannon距离,Unifrac或Kantorovich-Rubinstein Metric,Aitchison距离,编辑距离,各种图形内核和光谱距离以及基于最佳运输问题的其他距离。具体而言,该项目通过计数沿汇总数据点样本的最短的汉密尔顿路径(或循环)来推进数学理论和确切无分布的两个和多样本运行测试,变更点和其他相关问题的计算。该项目还考虑了相似性和基于距离的等级测试的分析,并得出了新的数学结果,使我们能够进行更高级的统计分析。该项目有助于:(i)对热带森林中生物多样性的更深入分析; (ii)对原核生物群落的分类学和功能谱的调查,随着不同海洋地区的环境条件的变化; (iii)研究太平洋深海泥中稀土元素组成的变异性; (iv)了解俄勒冈州海岸的潮间带有关上升和养分的关系。该项目融合了数学研究,科学和教育,并将为研究生提供论文工作的机会。该奖项反映了NSF的法定任务,并被认为是值得通过基金会的知识分子优点和更广泛的影响评估的评估来提供支持的。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Debashis Mondal其他文献

Wavelet variances for heavy‐tailed time series
重尾时间序列的小波方差
  • DOI:
  • 发表时间:
    2019
  • 期刊:
  • 影响因子:
    1.7
  • 作者:
    Rodney V. Fonseca;Debashis Mondal;L. Zhang
  • 通讯作者:
    L. Zhang
PAC Guarantees and Effective Algorithms for Detecting Novel Categories
PAC 保证和检测新类别的有效算法
  • DOI:
  • 发表时间:
    2022
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Si Liu;Risheek Garrepalli;Dan Hendrycks;Alan Fern;Debashis Mondal;Thomas G. Dietterich
  • 通讯作者:
    Thomas G. Dietterich

Debashis Mondal的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Debashis Mondal', 18)}}的其他基金

Distance-Based Analysis for Complex High-Dimensional Data
复杂高维数据的基于距离的分析
  • 批准号:
    2217007
  • 财政年份:
    2021
  • 资助金额:
    $ 30万
  • 项目类别:
    Standard Grant
Markov Random Fields, Geostatistics and Matrix-Free Computation
马尔可夫随机场、地统计学和无矩阵计算
  • 批准号:
    2153669
  • 财政年份:
    2021
  • 资助金额:
    $ 30万
  • 项目类别:
    Standard Grant
Markov Random Fields, Geostatistics and Matrix-Free Computation
马尔可夫随机场、地统计学和无矩阵计算
  • 批准号:
    1916448
  • 财政年份:
    2019
  • 资助金额:
    $ 30万
  • 项目类别:
    Standard Grant
2016 International Indian Statistical Association conference `Statistical and Data Sciences: A Key to Healthy People, Planet and Prosperity'
2016 年国际印度统计协会会议“统计和数据科学:人类健康、地球和繁荣的关键”
  • 批准号:
    1636648
  • 财政年份:
    2016
  • 资助金额:
    $ 30万
  • 项目类别:
    Standard Grant
CAREER: New Directions in Spatial Statistics
职业:空间统计的新方向
  • 批准号:
    1519890
  • 财政年份:
    2014
  • 资助金额:
    $ 30万
  • 项目类别:
    Continuing Grant
CAREER: New Directions in Spatial Statistics
职业:空间统计的新方向
  • 批准号:
    1254840
  • 财政年份:
    2013
  • 资助金额:
    $ 30万
  • 项目类别:
    Continuing Grant
Connecting Markov Random Fields with Geostatistical Models
连接马尔可夫随机场与地统计模型
  • 批准号:
    0906300
  • 财政年份:
    2009
  • 资助金额:
    $ 30万
  • 项目类别:
    Standard Grant

相似国自然基金

基于水-能耦合的长距离调水工程优化调度理论与应用
  • 批准号:
    51909035
  • 批准年份:
    2019
  • 资助金额:
    25.0 万元
  • 项目类别:
    青年科学基金项目
距离、分工与区域经济发展的关系研究 ----基于企业专业化经营和生产者服务外包的分析视角
  • 批准号:
    71903039
  • 批准年份:
    2019
  • 资助金额:
    19.0 万元
  • 项目类别:
    青年科学基金项目
基于布里渊时域分析的长跨距单纤传感器关键技术研究
  • 批准号:
    61875018
  • 批准年份:
    2018
  • 资助金额:
    61.0 万元
  • 项目类别:
    面上项目
心理距离视角下的奖励式众筹投资者决策行为研究-基于项目文本和多媒体信息分析
  • 批准号:
    71871168
  • 批准年份:
    2018
  • 资助金额:
    50.0 万元
  • 项目类别:
    面上项目
基于患者相似性分析的普适性临床决策支持方法研究
  • 批准号:
    81871456
  • 批准年份:
    2018
  • 资助金额:
    57.0 万元
  • 项目类别:
    面上项目

相似海外基金

Data Archiving A Longitudinal Cohort: Toledo Adolescent Relationships Study
数据归档纵向队列:托莱多青少年关系研究
  • 批准号:
    10511494
  • 财政年份:
    2022
  • 资助金额:
    $ 30万
  • 项目类别:
Addressing COVID 19 Vaccine Hesitancy in Rural Community Pharmacies Reducing Disparities Through an Implementation Science Approach
解决农村社区药房对 COVID 19 疫苗的犹豫,通过实施科学方法减少差异
  • 批准号:
    10522460
  • 财政年份:
    2022
  • 资助金额:
    $ 30万
  • 项目类别:
Aging in Place since the COVID-19 Pandemic Onset: A Study of Neighborhoods and Cognitive Health among Older Americans
自 COVID-19 大流行爆发以来的就地老龄化:美国老年人的社区和认知健康研究
  • 批准号:
    10876573
  • 财政年份:
    2022
  • 资助金额:
    $ 30万
  • 项目类别:
Data Archiving A Longitudinal Cohort: Toledo Adolescent Relationships Study
数据归档纵向队列:托莱多青少年关系研究
  • 批准号:
    10693319
  • 财政年份:
    2022
  • 资助金额:
    $ 30万
  • 项目类别:
Addressing COVID 19 Vaccine Hesitancy in Rural Community Pharmacies Reducing Disparities Through an Implementation Science Approach
解决农村社区药房对 COVID 19 疫苗的犹豫,通过实施科学方法减少差异
  • 批准号:
    10708869
  • 财政年份:
    2022
  • 资助金额:
    $ 30万
  • 项目类别:
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了