Hyperbolic Classification and Regression Trees
双曲分类和回归树
基本信息
- 批准号:0442178
- 负责人:
- 金额:--
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2004
- 资助国家:美国
- 起止时间:2004-09-15 至 2006-08-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Tree-based classification algorithms have proven very useful for a wide variety of data mining applications, but their performance sometimes suffers for very large data sets or data sets in very high dimensions. There are two fundamental challenges to overcome in these situations: First, as the number of dimensions of the attribute space increases, the sparseness of the data points increases dramatically. Second, as the number of dimensions of the attribute space increases, the standard Euclidian metric becomes less relevant for many problems of practical interest. Tree-based classifiers can be viewed as decomposing feature space into n-dimensional rectangles. If we view these cubes as being based implicitly on a Euclidean distance function, it becomes natural to consider decomposing feature space into n-dimensional rectangles based upon other distance functions, such as those that arise in hyperbolic geometry. In this research, the investigator uses this principle to develop tree-based classification algorithms and clustering algorithms based upon hyperbolic distance functions. He develops novel methods for scaling tree-based classifiers and other partition-based classification methods to very large data sets and to data sets in very large dimensions. In practice, applying classification algorithms to high volume data streams is limited by the difficulty that current network protocols have transporting high volume data flows over wide area networks with high bandwidth delay products. To overcome this, the investigator develops versions of hyperbolic clustering and classification algorithms suitable for high volume data streams by layering these algorithms over a new application layer network protocol developed in his laboratory specifically for wide area networks with high bandwidth delay products. According to an Army report, the goal of actionable intelligence is to give commanders and soldiers the ability to conduct successful operations by providing them with a high level of situational understanding in a manner that is rapid, accurate, and timely. An important enabling technology for this is the ability to classify, integrate, and route high volume data streams in real time that originate anywhere in the world. Because of the volume of the data, today's data mining algorithms have trouble scaling to the data sets of interest. Most data mining today is done using Euclidean metrics, although there are many other metrics that might be used. The investigator develops new data mining algorithms based upon hyperbolic metrics in order to develop extremely scalable data mining algorithms. Because of the limited capacities of many of today's networks, data mining applications have difficulty working with very large data sets over long distances. Using recent work in his laboratory which developed very high performance network protocols, he develops data mining applications for high performance networks which are effective with even very large data sets. He is preparing open source implementations of these algorithms so that they may be easily used by students and other interested parties.This award is supported jointly by the NSF and the Intelligence Community. The Approaches to Terrorism program in the Directorate for Mathematics and Physical Sciences supports new concepts in basic research and workforce development with the potential to contribute to national security.
事实证明,基于树的分类算法对于各种数据挖掘应用程序非常有用,但对于非常大的数据集或非常高维度的数据集,它们的性能有时会受到影响。在这些情况下,有两个基本的挑战需要克服:首先,随着属性空间维数的增加,数据点的稀疏性急剧增加。第二,随着属性空间的维数增加,标准欧几里得度量对于许多实际感兴趣的问题变得不太相关。基于树的分类器可以被看作是将特征空间分解成n维矩形。如果我们将这些立方体视为隐含地基于欧几里德距离函数,那么考虑基于其他距离函数(例如双曲几何中出现的距离函数)将特征空间分解为n维矩形就变得很自然了。在这项研究中,研究者使用这一原则,开发基于树的分类算法和聚类算法的基础上双曲线距离函数。他开发了新的方法,用于将基于树的分类器和其他基于分区的分类方法扩展到非常大的数据集和非常大维度的数据集。 在实践中,将分类算法应用于大容量数据流受到当前网络协议在具有高带宽延迟产物的广域网上传输大容量数据流的困难的限制。为了克服这一点,调查员开发版本的双曲线聚类和分类算法适合于高容量的数据流分层这些算法在一个新的应用层网络协议,专门为广域网开发的实验室高带宽延迟产品。 根据陆军的一份报告,可行动情报的目标是通过以快速、准确和及时的方式为指挥官和士兵提供高水平的态势理解,使他们能够成功地开展行动。为此,一项重要的使能技术是能够对源自世界任何地方的大容量数据流进行真实的分类、集成和路由。 由于数据量大,今天的数据挖掘算法很难扩展到感兴趣的数据集。今天的大多数数据挖掘都是使用欧几里德度量完成的,尽管还有许多其他度量可以使用。 研究者基于双曲度量开发新的数据挖掘算法,以开发可扩展性极强的数据挖掘算法。 由于当今许多网络的容量有限,数据挖掘应用程序很难在长距离上处理非常大的数据集。使用最近的工作在他的实验室开发了非常高性能的网络协议,他开发了数据挖掘应用程序的高性能网络,这是有效的,甚至非常大的数据集。 他正在准备这些算法的开源实现,以便学生和其他感兴趣的团体可以轻松使用。该奖项由NSF和情报界共同支持。数学和物理科学局的恐怖主义方法方案支持基础研究和劳动力发展方面的新概念,有可能为国家安全做出贡献。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Robert Grossman其他文献
A graph model based study on regulatory impacts of transcription factors of <em>Drosophila melanogaster</em> and comparison across species
- DOI:
10.1016/j.bbrc.2009.06.055 - 发表时间:
2009-09-04 - 期刊:
- 影响因子:
- 作者:
Feng Tian;Jia Chen;Suying Bao;Lin Shi;Xiangjun Liu;Robert Grossman - 通讯作者:
Robert Grossman
Autologous HER2 CMV bispecific CAR T cells are safe and demonstrate clinical benefit for glioblastoma in a Phase I trial.
- DOI:
10.1186/2051-1426-3-s2-o11 - 发表时间:
2015-11-04 - 期刊:
- 影响因子:10.600
- 作者:
Nabil Ahmed;Vita Brawley;Meenakshi Hegde;Kevin Bielamowicz;Amanda Wakefield;Alexia Ghazi;Aidin Ashoori;Oumar Diouf;Claudia Gerken;Daniel Landi;Mamta Kalra;Zhongzhen Yi;Cliona Rooney;Gianpietro Dotti;Adrian Gee;Helen Heslop;Stephen Gottschalk;Suzanne Powell;Robert Grossman;Winfried Wels;Yzonne Kew;David Baskin;Jonathan Zhang;Pamela New;John Hicks - 通讯作者:
John Hicks
Hopf-algebraic structure of combinatorial objects and differential operators
- DOI:
10.1007/bf02764614 - 发表时间:
1990-02-01 - 期刊:
- 影响因子:0.800
- 作者:
Robert Grossman;Richard G. Larson - 通讯作者:
Richard G. Larson
A comparison of intensity modulated conformal therapy with a conventional external beam stereotactic radiosurgery system for the treatment of single and multiple intracranial lesions.
调强适形疗法与传统外束立体定向放射外科系统治疗单个和多个颅内病变的比较。
- DOI:
- 发表时间:
1996 - 期刊:
- 影响因子:0
- 作者:
S. Woo;W. Grant;D. Bellezza;Robert Grossman;P. Gildenberg;L. Carpenter;M. Carol;E. Butler - 通讯作者:
E. Butler
Anti-HLA-DP antibodies may represent a significant barrier to successful kidney transplantation in re-grafted patients
- DOI:
10.1016/j.humimm.2005.10.009 - 发表时间:
2005-08-01 - 期刊:
- 影响因子:
- 作者:
Malek Kamoun;Marty Sellers;Christa Whitney-Miller;Jane Kearns;Erin Pierce;John Tomaszewski;Alden Doyle;Robert Grossman;Roy Bloom;Ali Naji;James Markmann;Simin Goral - 通讯作者:
Simin Goral
Robert Grossman的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Robert Grossman', 18)}}的其他基金
Workshop on Translational Data Science (TDS 17)
转化数据科学研讨会 (TDS 17)
- 批准号:
1742814 - 财政年份:2017
- 资助金额:
-- - 项目类别:
Standard Grant
BIGDATA: Small: DCM: Open Flow Enabled Hadoop over Local and Wide Area Clusters
BIGDATA:小型:DCM:本地和广域集群上支持开放流的 Hadoop
- 批准号:
1251201 - 财政年份:2013
- 资助金额:
-- - 项目类别:
Standard Grant
SDCI Net: UD* - A UDT-Based Application Suite for High Performance Data Transport
SDCI Net:UD* - 基于 UDT 的高性能数据传输应用程序套件
- 批准号:
1127316 - 财政年份:2011
- 资助金额:
-- - 项目类别:
Standard Grant
PIRE: Training and Workshops in Data Intensive Computing Using The Open Science Data Cloud
PIRE:使用开放科学数据云进行数据密集型计算的培训和研讨会
- 批准号:
1129076 - 财政年份:2010
- 资助金额:
-- - 项目类别:
Continuing Grant
PIRE: Training and Workshops in Data Intensive Computing Using The Open Science Data Cloud
PIRE:使用开放科学数据云进行数据密集型计算的培训和研讨会
- 批准号:
0968341 - 财政年份:2010
- 资助金额:
-- - 项目类别:
Continuing Grant
Web-based Interactive Organic Chemistry Homework
基于网络的互动有机化学作业
- 批准号:
0816783 - 财政年份:2009
- 资助金额:
-- - 项目类别:
Standard Grant
Web-based Interactive Organic Chemistry Homework
基于网络的互动有机化学作业
- 批准号:
0441201 - 财政年份:2005
- 资助金额:
-- - 项目类别:
Standard Grant
MRI: International Data Mining Grid Testbed for Research in High Performance Data Transport, Data Integration, and Data Exploration -- Instrument Development Proposal
MRI:用于高性能数据传输、数据集成和数据探索研究的国际数据挖掘网格测试平台——仪器开发提案
- 批准号:
0420847 - 财政年份:2004
- 资助金额:
-- - 项目类别:
Standard Grant
SCI: II: The TeraFlow Project: High Performance Flows for Mining Large Distributed Data Archives
SCI:II:TeraFlow 项目:用于挖掘大型分布式数据档案的高性能流程
- 批准号:
0430781 - 财政年份:2004
- 资助金额:
-- - 项目类别:
Standard Grant
ITR: Collaborative Research: A Data Mining and Exploration Middleware for Grid and Distributed Computing
ITR:协作研究:用于网格和分布式计算的数据挖掘和探索中间件
- 批准号:
0325013 - 财政年份:2003
- 资助金额:
-- - 项目类别:
Continuing Grant
相似海外基金
Functional Regression and Classification for Data Supported on Complex Geometries
复杂几何形状支持的数据的函数回归和分类
- 批准号:
2210064 - 财政年份:2022
- 资助金额:
-- - 项目类别:
Standard Grant
Robust sparse partial least squares regression and classification
鲁棒稀疏偏最小二乘回归和分类
- 批准号:
487299-2016 - 财政年份:2019
- 资助金额:
-- - 项目类别:
Postgraduate Scholarships - Doctoral
Robust sparse partial least squares regression and classification
鲁棒稀疏偏最小二乘回归和分类
- 批准号:
487299-2016 - 财政年份:2018
- 资助金额:
-- - 项目类别:
Postgraduate Scholarships - Doctoral
Robust sparse partial least squares regression and classification
鲁棒稀疏偏最小二乘回归和分类
- 批准号:
487299-2016 - 财政年份:2017
- 资助金额:
-- - 项目类别:
Postgraduate Scholarships - Doctoral
Robust sparse partial least squares regression and classification
鲁棒稀疏偏最小二乘回归和分类
- 批准号:
487299-2016 - 财政年份:2016
- 资助金额:
-- - 项目类别:
Postgraduate Scholarships - Doctoral
RUI: Classification, regression, and density estimation with missing variables
RUI:分类、回归和缺失变量的密度估计
- 批准号:
1407400 - 财政年份:2014
- 资助金额:
-- - 项目类别:
Continuing Grant
Establishment of classification and regression tree model for assessing of a risk for future occurence of life-style related disease using a community-based cohort data.
建立分类和回归树模型,用于使用基于社区的队列数据评估未来发生生活方式相关疾病的风险。
- 批准号:
25460768 - 财政年份:2013
- 资助金额:
-- - 项目类别:
Grant-in-Aid for Scientific Research (C)
Classification d'événements en vidéos par la regression logistique multinomiale
多项回归逻辑视频中的分类事件
- 批准号:
429633-2012 - 财政年份:2012
- 资助金额:
-- - 项目类别:
University Undergraduate Student Research Awards
Regression, classification, and bayesian networks
回归、分类和贝叶斯网络
- 批准号:
5172-2005 - 财政年份:2011
- 资助金额:
-- - 项目类别:
Discovery Grants Program - Individual
Building flexible tree models and ensemble tree models for statistical learning in classification, regression and failure time data analysis
构建灵活的树模型和集成树模型,用于分类、回归和故障时间数据分析中的统计学习
- 批准号:
311980-2008 - 财政年份:2010
- 资助金额:
-- - 项目类别:
Discovery Grants Program - Individual