Significance Based Procedures for Mining and Prediction of Large Data Sets

基于显着性的大数据集挖掘和预测程序

基本信息

  • 批准号:
    0907177
  • 负责人:
  • 金额:
    $ 21万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2009
  • 资助国家:
    美国
  • 起止时间:
    2009-09-01 至 2013-08-31
  • 项目状态:
    已结题

项目摘要

Exploratory methods play a critical role in the understanding of large data sets, regardless of their origin, and are typically the first step in their analysis. The investigator is studying the development and use of exploratory, data-mining methods that identify patterns or regularities in high-dimensional data. The specific focus of his research is the problem of identifying sample-variable associations in large data sets that may arise from multiple measurement technologies. In the typical case where the data from an experiment are represented in the form of a rectangular matrix, sample-variable associations correspond to distinguished submatrices of the data matrix. The investigator is developing a statistically principled, significance-based approach to the problem of finding large average submatrices of a data matrix, using a simple iterative algorithm. The algorithm is applicable to real-valued and categorical data matrices. In addition to the basic method, the investigator is developing several extensions, including data-driven null models that incorporate dependence between variables, data arising from the simultaneous application of multiple measurement technologies, and application of the basic method to prediction problems such as classification, regression and survival analysis. In addition, the investigator is developing basic theory to support the use of the algorithm, and to assess the structure of data matrices under the different null models. The development and application of the methods is being carried out in close collaboration with several groups of biomedical researchers. In particular, the new data mining methodology is being incorporated into software that is used by collaborating scientists to identify and assess significant sample-variable associations in ongoing experiments involving breast, brain and lung cancer.Large data sets are now common in many experimental areas of science, and in particular gene-level studies of human diseases such as cancer. In such studies it is not unusual to encounter experiments containing from hundreds to thousands of samples, and tens of thousands to millions of measurements on each sample. Large data sets are part of a trend away from traditional hypothesis-driven scientific research towards data-driven research, in which researchers explore large data sets for patterns or regularities that, in conjunction with subject matter expertise, yield hypotheses that can be tested by more traditional means. The investigator is studying an exploratory method that identifies statistically significant associations between samples and variables in large data sets, associations that can yield testable scientific hypotheses. The methods being developed by the investigator are computationally efficient, and are based on established statistical principles, in particular the notion of statistical significance. The investigator is also studying ways in which the basic exploratory method can be applied to data arising from multiple measurement technologies, and application of the basic method to statistical problems such as classification and survival analysis. These activities are being carried out as part of a collaborative research program involving the sustained interactions of faculty and students from the statistical, biological, and medical sciences. The exploratory method developed by the investigator is being integrated into the basic exploratory tools of the collaborating scientists, and is a component in the analysis of several new, previously unanalyzed, data sets.
探索性方法在理解大型数据集方面发挥着关键作用,无论其来源如何,并且通常是其分析的第一步。研究人员正在研究探索性数据挖掘方法的开发和使用,这些方法可以识别高维数据中的模式或规律。他的研究的具体重点是在可能由多种测量技术产生的大数据集中识别样本变量关联的问题。在实验数据以矩形矩阵的形式表示的典型情况下,样本变量关联对应于数据矩阵的区别子矩阵。研究人员正在开发一种统计原理的、基于重要性的方法来寻找数据矩阵的大平均子矩阵,使用一种简单的迭代算法。该算法适用于实值数据矩阵和分类数据矩阵。除了基本方法外,研究人员还在开发几个扩展,包括包含变量之间相关性的数据驱动零模型、同时应用多种测量技术产生的数据,以及将基本方法应用于分类、回归和生存分析等预测问题。此外,研究人员正在发展基础理论,以支持算法的使用,并评估不同零模型下的数据矩阵结构。这些方法的开发和应用是与几个生物医学研究小组密切合作进行的。特别是,新的数据挖掘方法正被纳入合作科学家使用的软件中,以在涉及乳腺癌、脑癌和肺癌的正在进行的实验中识别和评估重要的样本变量关联。现在,大型数据集在许多科学实验领域很常见,特别是对癌症等人类疾病的基因水平研究。在这样的研究中,遇到包含数百到数千个样本的实验,以及每个样本上的数万到数百万个测量值的实验并不少见。大数据集是从传统的假设驱动型科学研究转向数据驱动型研究的趋势的一部分,在这种趋势中,研究人员探索大数据集以寻找模式或规律,与主题专业知识一起产生可以用更传统的方法检验的假设。这位研究人员正在研究一种探索性方法,该方法可以在大型数据集中确定样本和变量之间具有统计学意义的关联,这种关联可以产生可检验的科学假设。研究人员正在开发的方法在计算上是有效的,并且基于既定的统计原则,特别是统计意义的概念。研究人员还在研究如何将基本探索性方法应用于来自多种测量技术的数据,以及将基本方法应用于分类和生存分析等统计问题。这些活动是作为合作研究计划的一部分进行的,该计划涉及统计、生物和医学科学的教职员工和学生之间的持续互动。研究人员开发的探索方法正在被整合到合作科学家的基本探索工具中,并是对几个以前未分析的新数据集进行分析的组成部分。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Andrew Nobel其他文献

Andrew Nobel的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Andrew Nobel', 18)}}的其他基金

Inference for Stationary Processes: Optimal Transport and Generalized Bayesian Approaches
平稳过程的推理:最优传输和广义贝叶斯方法
  • 批准号:
    2113676
  • 财政年份:
    2021
  • 资助金额:
    $ 21万
  • 项目类别:
    Standard Grant
Iterative testing procedures and high-dimensional scaling limits of extremal random structures
迭代测试程序和极值随机结构的高维缩放限制
  • 批准号:
    1613072
  • 财政年份:
    2016
  • 资助金额:
    $ 21万
  • 项目类别:
    Continuing Grant
Optimality Landscapes and Exploratory Data Analysis
最优性景观和探索性数据分析
  • 批准号:
    1310002
  • 财政年份:
    2013
  • 资助金额:
    $ 21万
  • 项目类别:
    Standard Grant
Analysis of High Dimensional Data Using Subspace Clustering
使用子空间聚类分析高维数据
  • 批准号:
    0406361
  • 财政年份:
    2004
  • 资助金额:
    $ 21万
  • 项目类别:
    Continuing Grant
Estimation from Dynamical Systems and Individual Sequences
动力系统和个体序列的估计
  • 批准号:
    9971964
  • 财政年份:
    1999
  • 资助金额:
    $ 21万
  • 项目类别:
    Standard Grant
Mathematical Sciences: Greedy Growing and its Applications
数学科学:贪婪增长及其应用
  • 批准号:
    9501926
  • 财政年份:
    1995
  • 资助金额:
    $ 21万
  • 项目类别:
    Continuing Grant

相似国自然基金

Data-driven Recommendation System Construction of an Online Medical Platform Based on the Fusion of Information
  • 批准号:
  • 批准年份:
    2024
  • 资助金额:
    万元
  • 项目类别:
    外国青年学者研究基金项目
Exploring the Intrinsic Mechanisms of CEO Turnover and Market Reaction: An Explanation Based on Information Asymmetry
  • 批准号:
    W2433169
  • 批准年份:
    2024
  • 资助金额:
    万元
  • 项目类别:
    外国学者研究基金项目
基于tag-based单细胞转录组测序解析造血干细胞发育的可变剪接
  • 批准号:
    81900115
  • 批准年份:
    2019
  • 资助金额:
    21.0 万元
  • 项目类别:
    青年科学基金项目
应用Agent-Based-Model研究围术期单剂量地塞米松对手术切口愈合的影响及机制
  • 批准号:
    81771933
  • 批准年份:
    2017
  • 资助金额:
    50.0 万元
  • 项目类别:
    面上项目
Reality-based Interaction用户界面模型和评估方法研究
  • 批准号:
    61170182
  • 批准年份:
    2011
  • 资助金额:
    57.0 万元
  • 项目类别:
    面上项目
Multistage,haplotype and functional tests-based FCAR 基因和IgA肾病相关关系研究
  • 批准号:
    30771013
  • 批准年份:
    2007
  • 资助金额:
    30.0 万元
  • 项目类别:
    面上项目
差异蛋白质组技术结合Array-based CGH 寻找骨肉瘤分子标志物
  • 批准号:
    30470665
  • 批准年份:
    2004
  • 资助金额:
    8.0 万元
  • 项目类别:
    面上项目
GaN-based稀磁半导体材料与自旋电子共振隧穿器件的研究
  • 批准号:
    60376005
  • 批准年份:
    2003
  • 资助金额:
    20.0 万元
  • 项目类别:
    面上项目

相似海外基金

Development of an RNA-based anticoagulant and antidote for precise on/off coagulation control during cardiovascular procedures
开发基于 RNA 的抗凝剂和解毒剂,用于心血管手术期间精确的开/关凝血控制
  • 批准号:
    10603072
  • 财政年份:
    2023
  • 资助金额:
    $ 21万
  • 项目类别:
Design of an optical fiber-based shape sensing needle with embedded fiber Bragg grating strain sensors for minimally invasive surgical procedures
用于微创外科手术的带有嵌入式光纤布拉格光栅应变传感器的基于光纤的形状传感针的设计
  • 批准号:
    576437-2022
  • 财政年份:
    2022
  • 资助金额:
    $ 21万
  • 项目类别:
    Canadian Graduate Scholarships Foreign Study Supplements
CAREER: Towards the Next Generation of Data-Driven and Performance-Based Multiscale Procedures in Mining Geotechnics
职业生涯:迈向采矿岩土工程中的下一代数据驱动和基于性能的多尺度程序
  • 批准号:
    2145092
  • 财政年份:
    2022
  • 资助金额:
    $ 21万
  • 项目类别:
    Standard Grant
Label-free, Multimodality Diffuse Reflectance and Polarization-Based Proximity Probe for Iatrogenic Nerve Injury Prevention During Surgical Procedures
基于无标记、多模态漫反射和偏振的接近探头,用于预防手术过程中的医源性神经损伤
  • 批准号:
    10491034
  • 财政年份:
    2022
  • 资助金额:
    $ 21万
  • 项目类别:
Simulation based surgical training for high risk low resourced procedures using a novel simulator with smart mentoring
使用带有智能指导的新型模拟器,针对高风险、低资源手术进行基于模拟的手术培训
  • 批准号:
    10644160
  • 财政年份:
    2022
  • 资助金额:
    $ 21万
  • 项目类别:
Emergence of cooperative diagnostic robotics based on measurement and analysis of clinical procedures
基于临床程序测量和分析的协作诊断机器人的出现
  • 批准号:
    22K12879
  • 财政年份:
    2022
  • 资助金额:
    $ 21万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Development of noise-robust statistical anomaly detection procedures for condition-based maintenance
开发基于状态的维护的抗噪声统计异常检测程序
  • 批准号:
    21K14372
  • 财政年份:
    2021
  • 资助金额:
    $ 21万
  • 项目类别:
    Grant-in-Aid for Early-Career Scientists
Statistical Procedures and Performance Measures for Simulator-Based Frequentist Inference
基于模拟器的频率推理的统计程序和性能测量
  • 批准号:
    2053804
  • 财政年份:
    2021
  • 资助金额:
    $ 21万
  • 项目类别:
    Standard Grant
Development and evaluation of stochastic models for condition-based maintenance with multiple observation procedures
具有多个观测程序的基于状态的维护随机模型的开发和评估
  • 批准号:
    20K04989
  • 财政年份:
    2020
  • 资助金额:
    $ 21万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Training of machine-learning based procedures for automated postcorrection of OCRed historical printings
基于机器学习的程序培训,用于 ORed 历史打印的自动后期校正
  • 批准号:
    431091758
  • 财政年份:
    2020
  • 资助金额:
    $ 21万
  • 项目类别:
    Research Grants
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了