Scalable Learning with Ensemble Techniques and Parallel Computing
使用集成技术和并行计算的可扩展学习
基本信息
- 批准号:7433144
- 负责人:
- 金额:$ 2.55万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2008
- 资助国家:美国
- 起止时间:2008-05-01 至 2008-11-30
- 项目状态:已结题
- 来源:
- 关键词:AdoptionAlgorithmsArchitectureArtsBiological SciencesBiomedical ResearchCationsClassClassificationCommunicationCommunitiesCompanionsComplexComputer softwareComputersConsultDataData SetDatabasesDetectionDiagnosisDiseaseEffectivenessEmerging TechnologiesEnsureFosteringFoundationsFutureGenerationsGoalsGraphGroupingImageryKnowledgeLanguageLearningLibrariesMachine LearningMemoryMethodologyMethodsModelingNatureNumbersPerformancePersonal SatisfactionPhasePreventionProblem SolvingProgram DevelopmentPublic HealthRandomizedRangeResearchResearch InfrastructureResearch PersonnelRunningSchemeSimulateSoftware DesignSoftware ToolsSpeedStructureTechniquesTechnologyTestingTodayTrainingVotingWorkbasecomputerized toolsdata miningdesignforestimprovedinnovationnext generationparallel computingprogramsprototyperesearch and developmentresponsesoftware developmentsoundstatisticstheoriestool
项目摘要
DESCRIPTION (provided by applicant): The ability to conduct basic and applied biomedical research is becoming increasingly dependent on data produced by new and emerging technologies. This data has an unprecedented amount of detail and volume. Researchers are therefore dependent on computing and computational tools to be able to visualize, analyze, model, and interpret these large and complex sets of data. Tools for disease detection, diagnosis, treatment, and prevention are common goals of many, if not all, biomedical research programs. Sound analytical and statistical theory and methodology for class pre- diction and class discovery lay the foundation for building these tools, of which the machine learning techniques of classification (supervised learning) and clustering (unsupervised learning) are crucial. Our goal is to produce software for analysis and interpretation of large data sets using ensemble machine learning techniques and parallel computing technologies. Ensemble techniques are recent advances in machine learning theory and methodology leading to great improvements in accuracy and stability in data set analysis and interpretation. The results from a committee of primary machine learners (classifiers or clusterers) that have been trained on different instance or feature subsets are combined through techniques such as voting. The high prediction accuracy of classifier ensembles (such as boosting, bagging, and random forests) has generated much excitement in the statistics and machine learning communities. Recent research extends the ensemble methodology to clustering, where class information is unavailable, also yielding superior performance in terms of accuracy and stability. In theory, most ensemble techniques are inherently parallel. However, existing implementations are generally serial and assume the data set is memory resident. Therefore current software will not scale to the large data sets produced in today's biomedical research. We propose to take two approaches to scale ensemble techniques to large data sets: data partitioning approaches and parallel computing. The focus of Phase I will be to prototype scalable classifier ensembles using parallel architectures. We intend to: establish the parallel computing infrastructures; produce a preliminary architecture and software design; investigate a wide range of ensemble generation schemes using data partitioning strategies; and implement scalable bagging and random forests based on the preliminary design. The focus of Phase II will be to complete the software architecture and implement the scalable classifier ensembles and scalable clusterer ensembles within this framework. We intend to: complete research and development of classifier ensembles; extend the classification framework to clusterer ensembles; research and develop a unified interface for building ensembles with differing generation mechanisms and combination strategies; and evaluate the effectiveness of the software on simulated and real data. PUBLIC HEALTH RELEVANCE: The common goals to many, if not all, biomedical research programs are the development of tools for disease detection, diagnosis, treatment, and prevention. These programs often rely on new types of data that have an unprecedented amount of detail and volume. Our goal is to produce software for the analysis and interpretation of large data sets using ensemble machine learning techniques and parallel computing technologies to enable researchers who are dependent on computational tools to have the ability to visualize, analyze, model, and interpret these large and complex sets of data.
描述(由申请人提供):进行基础和应用生物医学研究的能力越来越依赖于新技术和新兴技术产生的数据。这些数据具有前所未有的细节和数量。因此,研究人员依赖于计算和计算工具来可视化、分析、建模和解释这些庞大而复杂的数据集。疾病检测、诊断、治疗和预防的工具是许多(如果不是全部)生物医学研究项目的共同目标。良好的分析统计类预测和类发现的理论和方法为构建这些工具奠定了基础,其中分类(监督学习)和聚类(非监督学习)的机器学习技术是关键。我们的目标是使用集成机器学习技术和并行计算技术生产用于分析和解释大数据集的软件。集成技术是机器学习理论和方法的最新进展,它极大地提高了数据集分析和解释的准确性和稳定性。对不同实例或特征子集进行训练的初级机器学习者(分类器或聚类器)委员会的结果通过投票等技术组合起来。分类器集成(如增强、装袋和随机森林)的高预测精度在统计和机器学习界引起了极大的兴奋。最近的研究将集成方法扩展到类信息不可用的聚类,在准确性和稳定性方面也产生了优越的性能。从理论上讲,大多数合奏技术本质上是平行的。然而,现有的实现通常是串行的,并且假设数据集驻留在存储器中。因此,目前的软件无法适应当今生物医学研究中产生的大型数据集。我们建议采取两种方法来将集成技术扩展到大型数据集:数据划分方法和并行计算。第一阶段的重点将是使用并行体系结构构建可伸缩的分类器集成原型。我们打算:建立并行计算基础设施;产生初步的体系结构和软件设计;使用数据划分策略研究广泛的集成生成方案;并在初步设计的基础上实施可扩展的袋装和随机森林。第二阶段的重点将是完成软件体系结构,并在此框架内实现可伸缩的分类器集成和可伸缩的集群器集成。我们打算:完成分类器集成的研究和开发;将分类框架扩展到聚类器集成;研究和开发用于构建具有不同生成机制和组合策略的集成的统一接口;以及评估软件在模拟数据和真实数据上的有效性。与公共卫生相关:许多生物医学研究项目的共同目标是开发用于疾病检测、诊断、治疗和预防的工具。这些程序通常依赖于新类型的数据,这些数据具有前所未有的细节和数量。我们的目标是使用集成机器学习技术和并行计算技术生产用于分析和解释大型数据集的软件,使依赖于计算工具的研究人员能够可视化、分析、建模和解释这些大型且复杂的数据集。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
LIXIN GONG其他文献
LIXIN GONG的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('LIXIN GONG', 18)}}的其他基金
S+MASSANALYZER: PROTEIN MASS SPECTRA - CLASSIFICATION
S MASSANALYZER:蛋白质质谱 - 分类
- 批准号:
7541851 - 财政年份:2005
- 资助金额:
$ 2.55万 - 项目类别:
S+MASS ANALYZER: PROTEIN MASS SPECTRA PROCESSING
S 质量分析仪:蛋白质质谱处理
- 批准号:
7541850 - 财政年份:2005
- 资助金额:
$ 2.55万 - 项目类别:
Advanced PET/CT Fusion Workstation for Cancer Management
用于癌症管理的先进 PET/CT 融合工作站
- 批准号:
6878540 - 财政年份:2004
- 资助金额:
$ 2.55万 - 项目类别:
相似海外基金
CAREER: Efficient Algorithms for Modern Computer Architecture
职业:现代计算机架构的高效算法
- 批准号:
2339310 - 财政年份:2024
- 资助金额:
$ 2.55万 - 项目类别:
Continuing Grant
Collaborative Research: SHF: Small: Artificial Intelligence of Things (AIoT): Theory, Architecture, and Algorithms
合作研究:SHF:小型:物联网人工智能 (AIoT):理论、架构和算法
- 批准号:
2221742 - 财政年份:2022
- 资助金额:
$ 2.55万 - 项目类别:
Standard Grant
Collaborative Research: SHF: Small: Artificial Intelligence of Things (AIoT): Theory, Architecture, and Algorithms
合作研究:SHF:小型:物联网人工智能 (AIoT):理论、架构和算法
- 批准号:
2221741 - 财政年份:2022
- 资助金额:
$ 2.55万 - 项目类别:
Standard Grant
Algorithms and Architecture for Super Terabit Flexible Multicarrier Coherent Optical Transmission
超太比特灵活多载波相干光传输的算法和架构
- 批准号:
533529-2018 - 财政年份:2020
- 资助金额:
$ 2.55万 - 项目类别:
Collaborative Research and Development Grants
OAC Core: Small: Architecture and Network-aware Partitioning Algorithms for Scalable PDE Solvers
OAC 核心:小型:可扩展 PDE 求解器的架构和网络感知分区算法
- 批准号:
2008772 - 财政年份:2020
- 资助金额:
$ 2.55万 - 项目类别:
Standard Grant
Algorithms and Architecture for Super Terabit Flexible Multicarrier Coherent Optical Transmission
超太比特灵活多载波相干光传输的算法和架构
- 批准号:
533529-2018 - 财政年份:2019
- 资助金额:
$ 2.55万 - 项目类别:
Collaborative Research and Development Grants
Visualization of FPGA CAD Algorithms and Target Architecture
FPGA CAD 算法和目标架构的可视化
- 批准号:
541812-2019 - 财政年份:2019
- 资助金额:
$ 2.55万 - 项目类别:
University Undergraduate Student Research Awards
Collaborative Research: ABI Innovation: Algorithms for recovering root architecture from 3D imaging
合作研究:ABI 创新:从 3D 成像恢复根结构的算法
- 批准号:
1759836 - 财政年份:2018
- 资助金额:
$ 2.55万 - 项目类别:
Standard Grant
Collaborative Research: ABI Innovation: Algorithms for recovering root architecture from 3D imaging
合作研究:ABI 创新:从 3D 成像恢复根结构的算法
- 批准号:
1759796 - 财政年份:2018
- 资助金额:
$ 2.55万 - 项目类别:
Standard Grant
Collaborative Research: ABI Innovation: Algorithms for recovering root architecture from 3D imaging
合作研究:ABI 创新:从 3D 成像恢复根结构的算法
- 批准号:
1759807 - 财政年份:2018
- 资助金额:
$ 2.55万 - 项目类别:
Standard Grant