CAREER: Scalable Algorithms for Knowledge Discovery in Scientific Data Sets
职业:科学数据集中知识发现的可扩展算法
基本信息
- 批准号:0133464
- 负责人:
- 金额:$ 32.07万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2002
- 资助国家:美国
- 起止时间:2002-02-01 至 2008-01-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Data mining is the process of automatically extracting useful information hidden in large data sets. This emerging discipline is becoming increasingly important as advances in data collection have lead to the explosive growth in the amount of available data. This project aims to develop a wide-range of novel data mining algorithms suitable for the characteristics of scientific data sets arising in genomics and fluid dynamics. Our research will focus on developing algorithms both for sequential datasets and for datasets that can be represented by directed labeled graphs. The graph-based modeling enables us to capture in a single and unified framework many of the spatial, topological, geometric, and other types of relational characteristics present in scientific datasets. The specific research tasks that we plan to address are: (i) Development of scalable algorithms for finding frequently occurring patterns in graph data sets and algorithms for finding patterns whose frequency decreases as a function of the pattern-length. (ii) Development of scalable and high quality clustering algorithms for sequence and graph data sets which operate directly in the native feature space. (iii) Development of scalable and accurate classification algorithms based on automated sequential or relational feature extraction approaches. These algorithms will be validated by analyzing data sets arisingin genomic and turbulent fluid flow. The project integrates the data mining research with an educational plan that focuses on initiating undergraduate and graduate students to the various computational and data analysis aspects of genomic research and developing a comprehensive bioinformatics curriculum whose goal is to foster multi-disciplinary research and collaboration. In addition, a comprehensive set of software tools will be developed and made available that can be used both to train students in using data mining techniques and to conduct novel research expanding the levels of understanding in various scientific disciplines.
数据挖掘是自动提取隐藏在大型数据集中的有用信息的过程。随着数据收集的进步导致可用数据量的爆炸性增长,这一新兴学科变得越来越重要。该项目旨在开发一系列适合基因组学和流体动力学中科学数据集特征的新颖数据挖掘算法。我们的研究将集中于开发用于顺序数据集和可以由有向标记图表示的数据集的算法。基于图的建模使我们能够在一个统一的框架中捕获科学数据集中存在的许多空间、拓扑、几何和其他类型的关系特征。我们计划解决的具体研究任务是:(i)开发可扩展的算法,用于查找图形数据集中频繁出现的模式,以及用于查找频率随模式长度而降低的模式的算法。 (ii) 为直接在本机特征空间中运行的序列和图数据集开发可扩展的高质量聚类算法。 (iii) 基于自动顺序或关系特征提取方法开发可扩展且准确的分类算法。这些算法将通过分析基因组和湍流流体流动中产生的数据集进行验证。该项目将数据挖掘研究与教育计划相结合,重点是让本科生和研究生了解基因组研究的各个计算和数据分析方面,并开发全面的生物信息学课程,其目标是促进多学科研究和合作。此外,还将开发并提供一套全面的软件工具,可用于培训学生使用数据挖掘技术,并进行新颖的研究,扩大对各个科学学科的理解水平。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
George Karypis其他文献
A knowledge graph of clinical trials ( $$\mathop {\mathtt {CTKG}}\limits$$ )
- DOI:
10.1038/s41598-022-08454-z - 发表时间:
2022-03-18 - 期刊:
- 影响因子:3.900
- 作者:
Ziqi Chen;Bo Peng;Vassilis N. Ioannidis;Mufei Li;George Karypis;Xia Ning - 通讯作者:
Xia Ning
Predicting the Performance of Randomized Parallel Search: An Application to Robot Motion Planning
- DOI:
10.1023/a:1026283627113 - 发表时间:
2003-09-01 - 期刊:
- 影响因子:2.800
- 作者:
Daniel J. Challou;Maria Gini;Vipin Kumar;George Karypis - 通讯作者:
George Karypis
Out-of-core coherent closed quasi-clique mining from large dense graph databases
从大型密集图数据库中进行核外相干封闭准集团挖掘
- DOI:
10.1145/1242524.1242530 - 发表时间:
2007-06 - 期刊:
- 影响因子:0
- 作者:
Jianyong Wang;Zhiping Zeng;George Karypis;Lizhu Zhou - 通讯作者:
Lizhu Zhou
Grade prediction with models specific to students and courses
- DOI:
10.1007/s41060-016-0024-z - 发表时间:
2016-09-22 - 期刊:
- 影响因子:2.800
- 作者:
Agoritsa Polyzou;George Karypis - 通讯作者:
George Karypis
Efficient identification of Tanimoto nearest neighbors
- DOI:
10.1007/s41060-017-0064-z - 发表时间:
2017-08-02 - 期刊:
- 影响因子:2.800
- 作者:
David C. Anastasiu;George Karypis - 通讯作者:
George Karypis
George Karypis的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('George Karypis', 18)}}的其他基金
REU Site: Computational Methods for Discovery Driven by Big Data
REU 网站:大数据驱动的发现计算方法
- 批准号:
1757916 - 财政年份:2018
- 资助金额:
$ 32.07万 - 项目类别:
Standard Grant
III: Medium: High-Performance Factorization Tools for Constrained and Hidden Tensor Models
III:中:用于约束和隐藏张量模型的高性能分解工具
- 批准号:
1704074 - 财政年份:2017
- 资助金额:
$ 32.07万 - 项目类别:
Continuing Grant
PFI:AIR - TT: Automated Out-of-Core Execution of Parallel Message-Passing Applications
PFI:AIR - TT:并行消息传递应用程序的自动核外执行
- 批准号:
1414153 - 财政年份:2014
- 资助金额:
$ 32.07万 - 项目类别:
Standard Grant
BIGDATA: IA: DKA: Collaborative Research: Learning Data Analytics: Providing Actionable Insights to Increase College Student Success
大数据:IA:DKA:协作研究:学习数据分析:提供可行的见解以提高大学生的成功
- 批准号:
1447788 - 财政年份:2014
- 资助金额:
$ 32.07万 - 项目类别:
Continuing Grant
SI2-SSE: Software Infrastructure For Partitioning Sparse Graphs on Existing and Emerging Computer Architectures
SI2-SSE:用于在现有和新兴计算机架构上分区稀疏图的软件基础设施
- 批准号:
1048018 - 财政年份:2010
- 资助金额:
$ 32.07万 - 项目类别:
Standard Grant
III: Medium: Collaborative Research: Computational Methods to Advance Chemical Genetics by Bridging Chemical and Biological Spaces
III:媒介:合作研究:通过桥接化学和生物空间推进化学遗传学的计算方法
- 批准号:
0905220 - 财政年份:2009
- 资助金额:
$ 32.07万 - 项目类别:
Continuing Grant
SEI: Virtual Screening Algorithms for Bioactive Compounds Based on Frequent Substructures
SEI:基于频繁子结构的生物活性化合物虚拟筛选算法
- 批准号:
0431135 - 财政年份:2004
- 资助金额:
$ 32.07万 - 项目类别:
Standard Grant
ITR/NGS: Graph Partitioning Algorithms for Complex Problems & Architectures
ITR/NGS:复杂问题的图划分算法
- 批准号:
0312828 - 财政年份:2003
- 资助金额:
$ 32.07万 - 项目类别:
Standard Grant
CISE Research Instrumentation: Cluster Computing for Knowledge Discovery in Diverse Data Sets
CISE Research Instrumentation:用于不同数据集中知识发现的集群计算
- 批准号:
9986042 - 财政年份:2000
- 资助金额:
$ 32.07万 - 项目类别:
Standard Grant
Multi-Constraint, Multi-Objective Graph Partitioning
多约束、多目标图划分
- 批准号:
9972519 - 财政年份:1999
- 资助金额:
$ 32.07万 - 项目类别:
Standard Grant
相似国自然基金
Scalable Learning and Optimization: High-dimensional Models and Online Decision-Making Strategies for Big Data Analysis
- 批准号:
- 批准年份:2024
- 资助金额:万元
- 项目类别:合作创新研究团队
相似海外基金
CAREER: Scalable algorithms for regularized and non-linear genetic models of gene expression
职业:基因表达的正则化和非线性遗传模型的可扩展算法
- 批准号:
2336469 - 财政年份:2024
- 资助金额:
$ 32.07万 - 项目类别:
Continuing Grant
CAREER: Fast Scalable Graph Algorithms
职业:快速可扩展图算法
- 批准号:
2340048 - 财政年份:2024
- 资助金额:
$ 32.07万 - 项目类别:
Continuing Grant
CAREER: Scalable and Robust Uncertainty Quantification using Subsampling Markov Chain Monte Carlo Algorithms
职业:使用子采样马尔可夫链蒙特卡罗算法进行可扩展且稳健的不确定性量化
- 批准号:
2340586 - 财政年份:2024
- 资助金额:
$ 32.07万 - 项目类别:
Continuing Grant
CAREER: Learning Kernels in Operators from Data: Learning Theory, Scalable Algorithms and Applications
职业:从数据中学习算子的内核:学习理论、可扩展算法和应用
- 批准号:
2238486 - 财政年份:2023
- 资助金额:
$ 32.07万 - 项目类别:
Continuing Grant
CAREER: Scalable Algorithms for Nonlinear, Large-Scale Inverse Problems Governed by Dynamical Systems
职业:动态系统控制的非线性、大规模反问题的可扩展算法
- 批准号:
2145845 - 财政年份:2022
- 资助金额:
$ 32.07万 - 项目类别:
Continuing Grant
CAREER: Scalable binning algorithms for genome-resolved metagenomics
职业:用于基因组解析宏基因组学的可扩展分箱算法
- 批准号:
1845890 - 财政年份:2019
- 资助金额:
$ 32.07万 - 项目类别:
Continuing Grant
CAREER: Pushing the Theoretical Limits of Scalable Distributed Algorithms
职业:突破可扩展分布式算法的理论极限
- 批准号:
1845146 - 财政年份:2019
- 资助金额:
$ 32.07万 - 项目类别:
Continuing Grant
CAREER: Towards Fast and Scalable Algorithms for Big Proteogenomics Data Analytics
职业:面向蛋白质基因组大数据分析的快速且可扩展的算法
- 批准号:
1925960 - 财政年份:2018
- 资助金额:
$ 32.07万 - 项目类别:
Standard Grant
CAREER: Towards Fast and Scalable Algorithms for Big Proteogenomics Data Analytics
职业:面向蛋白质基因组大数据分析的快速且可扩展的算法
- 批准号:
1651724 - 财政年份:2017
- 资助金额:
$ 32.07万 - 项目类别:
Standard Grant
CAREER: Scalable Algorithms for Spectral Analysis of Massive Networked Systems
职业:大规模网络系统频谱分析的可扩展算法
- 批准号:
1651433 - 财政年份:2017
- 资助金额:
$ 32.07万 - 项目类别:
Standard Grant