Symbolic Inference for Very Large Datasets

非常大的数据集的符号推理

基本信息

项目摘要

Datasets that are complex with the data themselves "complex", and/or with structures that impose complications) are becoming more and more routine with the impact of contemporary computer capacity. What is not routine is how to analyse these data. Indeed, the data "collection" is fast outpacing the ability to analyse them. It is evident that, even in those situations where in theory available methodology might seem to apply, routine use of such statistical techniques is often inappropriate. Some methods (e.g. squashing) take representative "`samples"' and then use standard procedures on the sampled data. Others seek sub/patterns (e.g., data mining) and then try to focus on the data behind those patterns. Others aggregate the data in some meaningful way. One such aggregation method produces so-called symbolic data (such as lists, intervals, distributions, etc.). An advantage of symbolic data is that unlike those in sampled sets, a symbolic-value retains all the original data, while simultaneously reducing the size of the dataset. Further while the massive datasets encountered today are one source of symbolic data, there are many data that are naturally symbolic (be these small or large datasets). All are better analysed by methods developed for symbolic data. The investigator addresses three major areas. One area is classification trees. Here, distances measures for interval and histogram-valued data are developed; and then they are used in new algorithms which extend the classical CART methodolgy to symbolic data. Secondly, regression methods, in particular, logistic regression and Cox's proportional hazard models, are adapted to symbolic data. Finally, factor analysis and principal component methodoly is developed for symbolic data. With the impact of contemporary computer capacity, datasets that are complex with the data themselves "complex" are becoming more ubiquitous. Yet those same computers often lack the capacity to analyse these massive datasets. Therefore, new ways to handle them must be developed. One way is to aggregate the data in a scientifically meaningful way (with the actual aggregation being dictated by the question at hand). Such aggregation will necessarily produce data that form lists, intervals, histograms, etc. The investigator develops new methodologies for interval data in three major areas, classification trees after rst nding distance measures for intervals and histograms, regression methods especially logistic regression, and factor analysis. The results are applied to data. A synergism is achieved by the integration of mathematical/ statistical/computational arenas in addressing real issues encountered by contemporary datasets. The outcomes cannot be achieved by the tools of just one of these disciplines but needs all three. The new methodologies will have wide applicability to those datasets generated in, e.g., meteorology, environmental science, social sciences, health-care programs, industry, and the like, well beyond those motivating the work. This will have enormous impact on US science. Further since doctoral students will be engaged as collaborators and since international researchers will be active participants, the research helps in the internationalization of the next and future generation of US scientists.
随着当代计算机能力的影响,数据本身“复杂”和/或具有强加复杂性的结构的复杂数据集变得越来越常规。不寻常的是如何分析这些数据。事实上,数据“收集”的速度快于分析数据的能力。显然,即使在理论上似乎可以适用现有方法的情况下,常规使用这种统计技术往往是不适当的。有些方法(如挤压法)采取有代表性的“`样本”,然后对抽样数据使用标准程序。其他人寻求子/模式(例如,数据挖掘),然后尝试关注这些模式背后的数据。其他人以某种有意义的方式汇总数据。一种这样的聚合方法产生所谓的符号数据(例如列表、间隔、分布等)。符号数据的一个优点是,与采样集中的数据不同,符号值保留了所有原始数据,同时减少了数据集的大小。此外,虽然今天遇到的大规模数据集是符号数据的一个来源,但有许多数据自然是符号数据(无论是小型还是大型数据集)。所有这些都可以通过为符号数据开发的方法进行更好的分析。调查员讨论了三个主要领域。一个领域是分类树。在这里,间隔和直方图值的数据的距离措施的发展,然后他们被用于新的算法,扩展了经典的CART methodolgy符号数据。其次,回归方法,特别是逻辑回归和考克斯的比例风险模型,适用于符号数据。最后,对符号数据进行了因子分析和主成分分析。随着当代计算机容量的影响,与数据本身“复杂”的数据集变得越来越普遍。然而,这些计算机往往缺乏分析这些海量数据集的能力。因此,必须开发新的方法来处理它们。一种方法是以科学上有意义的方式汇总数据(实际汇总由手头的问题决定)。这种聚集必然会产生的数据,形式列表,间隔,直方图等调查员开发新的方法,间隔数据在三个主要领域,分类树后,rst nding距离措施的间隔和直方图,回归方法,特别是逻辑回归,因子分析。将结果应用于数据。通过整合数学/统计/计算领域来解决当代数据集遇到的真实的问题,实现了协同作用。这些成果不可能仅仅通过其中一个学科的工具来实现,而需要所有三个学科的工具。新的方法将广泛适用于那些产生的数据集,例如,气象学、环境科学、社会科学、卫生保健计划、工业等等,远远超出了那些激励工作的因素。这将对美国科学产生巨大影响。此外,由于博士生将作为合作者参与,国际研究人员将积极参与,这项研究有助于下一代和未来一代美国科学家的国际化。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Lynne Billard其他文献

Hierarchical clustering for histogram data
直方图数据的层次聚类
span style=font-family:quot;Times New Romanquot;,serif;font-size:12pt;Principal Component Analysis for Compositional Data Vectors/span
成分数据向量的主成分分析
  • DOI:
  • 发表时间:
  • 期刊:
  • 影响因子:
    1.3
  • 作者:
    Huiwen Wang;Liying Shangguan;Rong Guan;Lynne Billard
  • 通讯作者:
    Lynne Billard
On a matrix‐valued autoregressive model
矩阵值自回归模型
Dissimilarity Measures for Histogram-valued Observations
直方图值观测值的相异性测量

Lynne Billard的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Lynne Billard', 18)}}的其他基金

Workshop: Pathways to the Future Workshop 2004
研讨会:未来之路研讨会 2004
  • 批准号:
    0400585
  • 财政年份:
    2004
  • 资助金额:
    $ 15万
  • 项目类别:
    Standard Grant
Statistical Inference for Complex Data
复杂数据的统计推断
  • 批准号:
    0400584
  • 财政年份:
    2004
  • 资助金额:
    $ 15万
  • 项目类别:
    Standard Grant
Pathways to the Future Workshop 2003
未来之路研讨会 2003
  • 批准号:
    0307631
  • 财政年份:
    2003
  • 资助金额:
    $ 15万
  • 项目类别:
    Standard Grant
U.S.-France Cooperative Research (INRIA): Symbolic Data Analysis Project
美法合作研究(INRIA):符号数据分析项目
  • 批准号:
    0093738
  • 财政年份:
    2001
  • 资助金额:
    $ 15万
  • 项目类别:
    Standard Grant
Workshops: Pathways to the Future
研讨会:通往未来的道路
  • 批准号:
    0102306
  • 财政年份:
    2001
  • 资助金额:
    $ 15万
  • 项目类别:
    Standard Grant
International Biometric Conference - San Francisco
国际生物识别会议 - 旧金山
  • 批准号:
    0070176
  • 财政年份:
    2000
  • 资助金额:
    $ 15万
  • 项目类别:
    Standard Grant
International Biometric Conference to be held December 13-18, 1998, in Cape Town, South Africa
国际生物识别会议将于1998年12月13日至18日在南非开普敦举行
  • 批准号:
    9730906
  • 财政年份:
    1998
  • 资助金额:
    $ 15万
  • 项目类别:
    Standard Grant
Mathematical Sciences: Workshop: Statistical Image Analysis - July 1-5, 1996
数学科学:研讨会:统计图像分析 - 1996 年 7 月 1-5 日
  • 批准号:
    9626865
  • 财政年份:
    1996
  • 资助金额:
    $ 15万
  • 项目类别:
    Standard Grant
Workshops: Pathways to the Future
研讨会:通往未来的道路
  • 批准号:
    9629283
  • 财政年份:
    1996
  • 资助金额:
    $ 15万
  • 项目类别:
    Standard Grant
Mathematical Sciences: International Biometric Conference
数学科学:国际生物识别会议
  • 批准号:
    9628290
  • 财政年份:
    1996
  • 资助金额:
    $ 15万
  • 项目类别:
    Standard Grant

相似海外基金

Spectral embedding methods and subsequent inference tasks on dynamic multiplex graphs
动态多路复用图上的谱嵌入方法和后续推理任务
  • 批准号:
    EP/Y002113/1
  • 财政年份:
    2024
  • 资助金额:
    $ 15万
  • 项目类别:
    Research Grant
CAREER: Game Theoretic Models for Robust Cyber-Physical Interactions: Inference and Design under Uncertainty
职业:稳健的网络物理交互的博弈论模型:不确定性下的推理和设计
  • 批准号:
    2336840
  • 财政年份:
    2024
  • 资助金额:
    $ 15万
  • 项目类别:
    Continuing Grant
CAREER: Statistical foundations of particle tracking and trajectory inference
职业:粒子跟踪和轨迹推断的统计基础
  • 批准号:
    2339829
  • 财政年份:
    2024
  • 资助金额:
    $ 15万
  • 项目类别:
    Continuing Grant
Collaborative Research: SHF: Small: Efficient and Scalable Privacy-Preserving Neural Network Inference based on Ciphertext-Ciphertext Fully Homomorphic Encryption
合作研究:SHF:小型:基于密文-密文全同态加密的高效、可扩展的隐私保护神经网络推理
  • 批准号:
    2412357
  • 财政年份:
    2024
  • 资助金额:
    $ 15万
  • 项目类别:
    Standard Grant
Probabilistic Inference Based Utility Evaluation and Path Generation for Active Autonomous Exploration of USVs in Unknown Confined Marine Environments
基于概率推理的效用评估和路径生成,用于未知受限海洋环境中 USV 主动自主探索
  • 批准号:
    EP/Y000862/1
  • 财政年份:
    2024
  • 资助金额:
    $ 15万
  • 项目类别:
    Research Grant
CAREER: Efficient Large Language Model Inference Through Codesign: Adaptable Software Partitioning and FPGA-based Distributed Hardware
职业:通过协同设计进行高效的大型语言模型推理:适应性软件分区和基于 FPGA 的分布式硬件
  • 批准号:
    2339084
  • 财政年份:
    2024
  • 资助金额:
    $ 15万
  • 项目类别:
    Continuing Grant
AI4PhotMod - Artificial Intelligence for parameter inference in Photosynthesis Models
AI4PhotMod - 用于光合作用模型中参数推断的人工智能
  • 批准号:
    BB/Y51388X/1
  • 财政年份:
    2024
  • 资助金额:
    $ 15万
  • 项目类别:
    Research Grant
CSR: Small: Latency-controlled Reduction of Data Center Expenses for Handling Bursty ML Inference Requests
CSR:小:通过延迟控制减少数据中心处理突发 ML 推理请求的费用
  • 批准号:
    2336886
  • 财政年份:
    2024
  • 资助金额:
    $ 15万
  • 项目类别:
    Standard Grant
CAREER: Statistical Inference in Observational Studies -- Theory, Methods, and Beyond
职业:观察研究中的统计推断——理论、方法及其他
  • 批准号:
    2338760
  • 财政年份:
    2024
  • 资助金额:
    $ 15万
  • 项目类别:
    Continuing Grant
STATISTICAL AND COMPUTATIONAL THRESHOLDS IN SPIN GLASSES AND GRAPH INFERENCE PROBLEMS
自旋玻璃和图推理问题的统计和计算阈值
  • 批准号:
    2347177
  • 财政年份:
    2024
  • 资助金额:
    $ 15万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了