CAREER: Methodology for Statistical Computing in Massive Datasets: Parallel Approaches to Cluster and MCMC Estimation

职业:海量数据集中的统计计算方法:聚类和 MCMC 估计的并行方法

基本信息

  • 批准号:
    0437555
  • 负责人:
  • 金额:
    --
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Continuing Grant
  • 财政年份:
    2003
  • 资助国家:
    美国
  • 起止时间:
    2003-07-01 至 2010-11-30
  • 项目状态:
    已结题

项目摘要

CAREER: Methodology for Statistical Computing in Massive Datasets: Parallel Approaches to Cluster and MCMC EstimationDMS 0239734PI: Ranjan MaitraThis project is aimed at developing practical methodology for statistical analysis and estimation in massively sized databases. Because of automated data collection methods, there is nowa surfeit of severely multi-dimensional records. Grouping them into homogeneous clusters to better understand them is a desirable goal in a variety of applications, yet classical statistical methods are computationally infeasible in many cases. I propose to develop parallel methodology for this context. Although the methodology and theory developed will be quite general and for potential use in applications ranging from business, medicine, the environment, software quality assessment, I will conduct the research in the context of three scientific collaborations. The first pertains to theUS Environmental Protection Agency's (EPA) self-reported Toxic Releases Inventory (TRI) databases, where profiling the different facilities in terms of their product, demographic and business information can improve the accuracy of records, as well as better characterize them vis-a-vis their emissions mix. The second project is to assess the reliability of functional Magnetic Resonance Imaging (fMRI) scans, with a view to understanding the cognitive processes of the brain, as a first step to patient care and therapy. The third application is in bioinformatics where the goal is to cluster microarray data and also to analyze two-dimensional proteomic gel images. This will help in isolating genes and understanding their relationship with different disorders. Clustering is in general a very difficult problem, with empirical solutions even for very moderately sized datasets. I propose to develop multi-pass methodologies in several different scenarios. I also propose multi-scale simulation approaches to estimation in a high-dimensional context. One of the biggest challenges faced by simulation methods due to high dimensionality is the low mobility around the space to be traversed because of its vastness. I propose to address this issue by connecting these high-dimensional spaces to lower-dimensional ones (which are significantly smaller) and by using these lower scales to traverse from one corner of the higher-dimensional space to another. A final goal of this five-year plan is to investigate the development and estimation in more complex models for proteomic gel data. Most of the plans proposed will be possible only with a parallel computing interface. This is increasingly critical in a large number of scientific applications, and I propose to simultaneously provide statistics students with the necessary expertise by designing suitably tailored graduate and undergraduate classes.
职业生涯:海量数据集中统计计算的方法:集群和MCMC估计的并行方法DMS 0239734PI:Ranjan Maitra这个项目的目的是为大规模数据库中的统计分析和估计开发实用的方法。由于自动化的数据收集方法,现在出现了严重的多维记录过剩。在各种应用中,将它们归类为同质簇以更好地理解它们是一个理想的目标,但在许多情况下,经典统计方法在计算上是不可行的。我建议为这一背景制定并行方法。虽然开发的方法和理论将是相当普遍的,并可能在商业、医学、环境、软件质量评估等应用中使用,但我将在三个科学合作的背景下进行研究。第一个涉及美国环境保护局(EPA)自行报告的有毒物质释放清单(TRI)数据库,在该数据库中,根据不同设施的产品、人口和商业信息对其进行描述可以提高记录的准确性,并根据其排放组合更好地描述它们的特征。第二个项目是评估功能磁共振成像(FMRI)扫描的可靠性,以期了解大脑的认知过程,作为患者护理和治疗的第一步。第三个应用是生物信息学,其目标是聚集微阵列数据,并分析二维蛋白质组凝胶图像。这将有助于分离基因并了解它们与不同疾病的关系。聚类通常是一个非常困难的问题,即使对于非常中等大小的数据集,也有经验的解决方案。我建议在几种不同的情况下开发多通道方法。本文还提出了高维环境下的多尺度模拟估计方法。由于高维性,模拟方法面临的最大挑战之一是由于其广袤的空间而围绕要遍历的空间的低流动性。我建议通过将这些高维空间连接到低维空间(它们明显更小)来解决这个问题,并使用这些较低的尺度从高维空间的一个角落遍历到另一个角落。这个五年计划的最终目标是研究蛋白质组凝胶数据在更复杂的模型中的发展和估计。提出的大多数计划只有通过并行计算接口才能实现。在大量的科学应用中,这一点越来越重要,我建议通过设计适当的研究生和本科生课程,同时为统计学学生提供必要的专业知识。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Ranjan Maitra其他文献

Accounting for spot matching uncertainty in the analysis of proteomics data from two-dimensional gel electrophoresis
Quantitative matching of forensic evidence fragments using fracture surface topography and statistical learning
利用断裂表面形貌和统计学习对法医证据碎片进行定量匹配
  • DOI:
    10.1038/s41467-024-51594-1
  • 发表时间:
    2024-09-08
  • 期刊:
  • 影响因子:
    15.700
  • 作者:
    Geoffrey Z. Thompson;Bishoy Dawood;Tianyu Yu;Barbara K. Lograsso;John D. Vanderkolk;Ranjan Maitra;William Q. Meeker;Ashraf F. Bastawros
  • 通讯作者:
    Ashraf F. Bastawros

Ranjan Maitra的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Ranjan Maitra', 18)}}的其他基金

CAREER: Methodology for Statistical Computing in Massive Datasets: Parallel Approaches to Cluster and MCMC Estimation
职业:海量数据集中的统计计算方法:聚类和 MCMC 估计的并行方法
  • 批准号:
    0239734
  • 财政年份:
    2003
  • 资助金额:
    --
  • 项目类别:
    Continuing Grant

相似海外基金

Collaborative Research: IMR: MM-1A: Scalable Statistical Methodology for Performance Monitoring, Anomaly Identification, and Mapping Network Accessibility from Active Measurements
合作研究:IMR:MM-1A:用于性能监控、异常识别和主动测量映射网络可访问性的可扩展统计方法
  • 批准号:
    2319592
  • 财政年份:
    2023
  • 资助金额:
    --
  • 项目类别:
    Continuing Grant
CAREER: New Challenges in Statistical Genetics: Mendelian Randomization, Integrated Omics and General Methodology
职业:统计遗传学的新挑战:孟德尔随机化、综合组学和通用方法论
  • 批准号:
    2238656
  • 财政年份:
    2023
  • 资助金额:
    --
  • 项目类别:
    Continuing Grant
Collaborative Research: IMR: MM-1A: Scalable Statistical Methodology for Performance Monitoring, Anomaly Identification, and Mapping Network Accessibility from Active Measurements
合作研究:IMR:MM-1A:用于性能监控、异常识别和主动测量映射网络可访问性的可扩展统计方法
  • 批准号:
    2319593
  • 财政年份:
    2023
  • 资助金额:
    --
  • 项目类别:
    Standard Grant
Statistical Methodology for Multiple Events in Time and Space
时空多事件的统计方法
  • 批准号:
    RGPIN-2018-04799
  • 财政年份:
    2022
  • 资助金额:
    --
  • 项目类别:
    Discovery Grants Program - Individual
Innovations in Statistical Methodology and Applications to Economics, Engineering, Health, and Medicine
统计方法的创新及其在经济、工程、健康和医学中的应用
  • 批准号:
    2210913
  • 财政年份:
    2022
  • 资助金额:
    --
  • 项目类别:
    Standard Grant
Development and innovation of statistical theory and methodology of network meta-analysis
网络荟萃分析统计理论与方法的发展与创新
  • 批准号:
    22H03554
  • 财政年份:
    2022
  • 资助金额:
    --
  • 项目类别:
    Grant-in-Aid for Scientific Research (B)
Statistical Methodology for Correlated Data in Health Sciences
健康科学相关数据的统计方法
  • 批准号:
    RGPIN-2019-04741
  • 财政年份:
    2022
  • 资助金额:
    --
  • 项目类别:
    Discovery Grants Program - Individual
Statistical Methodology towards Technology
技术统计方法
  • 批准号:
    RGPIN-2019-06370
  • 财政年份:
    2022
  • 资助金额:
    --
  • 项目类别:
    Discovery Grants Program - Individual
Statistical methodology for rank based sampling design and finite mixture models
基于等级的抽样设计和有限混合模型的统计方法
  • 批准号:
    RGPIN-2020-06696
  • 财政年份:
    2022
  • 资助金额:
    --
  • 项目类别:
    Discovery Grants Program - Individual
Statistical Learning and Modeling: Methodology and Algorithm
统计学习和建模:方法和算法
  • 批准号:
    RGPIN-2019-05917
  • 财政年份:
    2022
  • 资助金额:
    --
  • 项目类别:
    Discovery Grants Program - Individual
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了