CAREER: Methodology for Statistical Computing in Massive Datasets: Parallel Approaches to Cluster and MCMC Estimation
职业:海量数据集中的统计计算方法:聚类和 MCMC 估计的并行方法
基本信息
- 批准号:0239734
- 负责人:
- 金额:$ 40万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2003
- 资助国家:美国
- 起止时间:2003-06-01 至 2004-08-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
CAREER: Methodology for Statistical Computing in Massive Datasets: Parallel Approaches to Cluster and MCMC EstimationDMS 0239734PI: Ranjan MaitraThis project is aimed at developing practical methodology for statistical analysis and estimation in massively sized databases. Because of automated data collection methods, there is nowa surfeit of severely multi-dimensional records. Grouping them into homogeneous clusters to better understand them is a desirable goal in a variety of applications, yet classical statistical methods are computationally infeasible in many cases. I propose to develop parallel methodology for this context. Although the methodology and theory developed will be quite general and for potential use in applications ranging from business, medicine, the environment, software quality assessment, I will conduct the research in the context of three scientific collaborations. The first pertains to theUS Environmental Protection Agency's (EPA) self-reported Toxic Releases Inventory (TRI) databases, where profiling the different facilities in terms of their product, demographic and business information can improve the accuracy of records, as well as better characterize them vis-a-vis their emissions mix. The second project is to assess the reliability of functional Magnetic Resonance Imaging (fMRI) scans, with a view to understanding the cognitive processes of the brain, as a first step to patient care and therapy. The third application is in bioinformatics where the goal is to cluster microarray data and also to analyze two-dimensional proteomic gel images. This will help in isolating genes and understanding their relationship with different disorders. Clustering is in general a very difficult problem, with empirical solutions even for very moderately sized datasets. I propose to develop multi-pass methodologies in several different scenarios. I also propose multi-scale simulation approaches to estimation in a high-dimensional context. One of the biggest challenges faced by simulation methods due to high dimensionality is the low mobility around the space to be traversed because of its vastness. I propose to address this issue by connecting these high-dimensional spaces to lower-dimensional ones (which are significantly smaller) and by using these lower scales to traverse from one corner of the higher-dimensional space to another. A final goal of this five-year plan is to investigate the development and estimation in more complex models for proteomic gel data. Most of the plans proposed will be possible only with a parallel computing interface. This is increasingly critical in a large number of scientific applications, and I propose to simultaneously provide statistics students with the necessary expertise by designing suitably tailored graduate and undergraduate classes.
职业:大规模数据集中的统计计算方法:集群和MCMC估计的并行方法dms 0239734PI: Ranjan maitra该项目旨在开发大规模数据库中统计分析和估计的实用方法。由于自动化的数据收集方法,现在有大量的严重多维记录。在各种应用程序中,将它们分组到同构集群中以更好地理解它们是一个理想的目标,然而,在许多情况下,经典的统计方法在计算上是不可行的。我建议在这种情况下开发并行的方法。虽然所开发的方法和理论将非常通用,并且在商业、医学、环境、软件质量评估等应用中具有潜在的用途,但我将在三个科学合作的背景下进行研究。首先是美国环境保护署(EPA)自我报告的有毒物质排放清单(TRI)数据库,该数据库根据不同设施的产品、人口统计和商业信息对其进行分析,可以提高记录的准确性,并更好地根据其排放组合对其进行表征。第二个项目是评估功能性磁共振成像(fMRI)扫描的可靠性,以期了解大脑的认知过程,作为患者护理和治疗的第一步。第三个应用是生物信息学,其目标是聚集微阵列数据,并分析二维蛋白质组凝胶图像。这将有助于分离基因并了解它们与不同疾病的关系。聚类通常是一个非常困难的问题,即使对于非常中等规模的数据集,也有经验解决方案。我建议在几个不同的场景中开发多通道方法。我还提出了在高维环境中进行估计的多尺度模拟方法。由于高维度,仿真方法面临的最大挑战之一是由于空间的巨大而导致其在被遍历空间周围的低移动性。我建议通过将这些高维空间连接到低维空间(它们要小得多)来解决这个问题,并使用这些低尺度从高维空间的一个角落穿越到另一个角落。这个五年计划的最终目标是研究蛋白质组学凝胶数据的更复杂模型的开发和估计。提出的大多数计划只有在并行计算接口的情况下才有可能实现。这在大量的科学应用中越来越重要,我建议通过设计适当定制的研究生和本科生课程,同时为统计学学生提供必要的专业知识。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Ranjan Maitra其他文献
Accounting for spot matching uncertainty in the analysis of proteomics data from two-dimensional gel electrophoresis
- DOI:
10.1007/s13571-011-0016-x - 发表时间:
2011-06-28 - 期刊:
- 影响因子:0.700
- 作者:
Volodymyr Melnykov;Ranjan Maitra;Dan Nettleton - 通讯作者:
Dan Nettleton
Quantitative matching of forensic evidence fragments using fracture surface topography and statistical learning
利用断裂表面形貌和统计学习对法医证据碎片进行定量匹配
- DOI:
10.1038/s41467-024-51594-1 - 发表时间:
2024-09-08 - 期刊:
- 影响因子:15.700
- 作者:
Geoffrey Z. Thompson;Bishoy Dawood;Tianyu Yu;Barbara K. Lograsso;John D. Vanderkolk;Ranjan Maitra;William Q. Meeker;Ashraf F. Bastawros - 通讯作者:
Ashraf F. Bastawros
Ranjan Maitra的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Ranjan Maitra', 18)}}的其他基金
CAREER: Methodology for Statistical Computing in Massive Datasets: Parallel Approaches to Cluster and MCMC Estimation
职业:海量数据集中的统计计算方法:聚类和 MCMC 估计的并行方法
- 批准号:
0437555 - 财政年份:2003
- 资助金额:
$ 40万 - 项目类别:
Continuing Grant
相似海外基金
Collaborative Research: IMR: MM-1A: Scalable Statistical Methodology for Performance Monitoring, Anomaly Identification, and Mapping Network Accessibility from Active Measurements
合作研究:IMR:MM-1A:用于性能监控、异常识别和主动测量映射网络可访问性的可扩展统计方法
- 批准号:
2319592 - 财政年份:2023
- 资助金额:
$ 40万 - 项目类别:
Continuing Grant
CAREER: New Challenges in Statistical Genetics: Mendelian Randomization, Integrated Omics and General Methodology
职业:统计遗传学的新挑战:孟德尔随机化、综合组学和通用方法论
- 批准号:
2238656 - 财政年份:2023
- 资助金额:
$ 40万 - 项目类别:
Continuing Grant
Collaborative Research: IMR: MM-1A: Scalable Statistical Methodology for Performance Monitoring, Anomaly Identification, and Mapping Network Accessibility from Active Measurements
合作研究:IMR:MM-1A:用于性能监控、异常识别和主动测量映射网络可访问性的可扩展统计方法
- 批准号:
2319593 - 财政年份:2023
- 资助金额:
$ 40万 - 项目类别:
Standard Grant
Innovations in Statistical Methodology and Applications to Economics, Engineering, Health, and Medicine
统计方法的创新及其在经济、工程、健康和医学中的应用
- 批准号:
2210913 - 财政年份:2022
- 资助金额:
$ 40万 - 项目类别:
Standard Grant
Statistical Methodology for Multiple Events in Time and Space
时空多事件的统计方法
- 批准号:
RGPIN-2018-04799 - 财政年份:2022
- 资助金额:
$ 40万 - 项目类别:
Discovery Grants Program - Individual
Development and innovation of statistical theory and methodology of network meta-analysis
网络荟萃分析统计理论与方法的发展与创新
- 批准号:
22H03554 - 财政年份:2022
- 资助金额:
$ 40万 - 项目类别:
Grant-in-Aid for Scientific Research (B)
Statistical Methodology towards Technology
技术统计方法
- 批准号:
RGPIN-2019-06370 - 财政年份:2022
- 资助金额:
$ 40万 - 项目类别:
Discovery Grants Program - Individual
Statistical Methodology for Correlated Data in Health Sciences
健康科学相关数据的统计方法
- 批准号:
RGPIN-2019-04741 - 财政年份:2022
- 资助金额:
$ 40万 - 项目类别:
Discovery Grants Program - Individual
Statistical methodology for rank based sampling design and finite mixture models
基于等级的抽样设计和有限混合模型的统计方法
- 批准号:
RGPIN-2020-06696 - 财政年份:2022
- 资助金额:
$ 40万 - 项目类别:
Discovery Grants Program - Individual
Statistical Learning and Modeling: Methodology and Algorithm
统计学习和建模:方法和算法
- 批准号:
RGPIN-2019-05917 - 财政年份:2022
- 资助金额:
$ 40万 - 项目类别:
Discovery Grants Program - Individual