Collaborative Research: OAC: Approximate Nearest Neighbor Similarity Search for Large Polygonal and Trajectory Datasets

合作研究:OAC:大型多边形和轨迹数据集的近似最近邻相似性搜索

基本信息

  • 批准号:
    2313039
  • 负责人:
  • 金额:
    $ 36.5万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2023
  • 资助国家:
    美国
  • 起止时间:
    2023-08-01 至 2026-07-31
  • 项目状态:
    未结题

项目摘要

Similarity searches are a critical task in data mining. Nearest neighbor similarity search over geometrical shapes - polygons and trajectories - are used in various domains such as digital pathology, solar physics, and geospatial intelligence. In digital pathology for tumor diagnosis, tissues are represented as polygons and Jaccard distance - ratio of areas of intersection to union - is used for similarity comparisons. In solar physics for predicting solar flares, the query object and the dataset is made up of polygons representing solar events. In geospatial intelligence, similarity search is used to geo-locate a shape or a contour in global reference datasets. The current literature, while rich in methods for textual and image datasets, is lacking for geometric datasets. This project will develop scalable similarity search systems on polygonal and trajectory datasets. It will produce benchmark datasets of polygonal queries and responses for the research community and inform the data mining techniques which employ similarity primitives. It will help introduce student projects for courses on parallel, distributed, high performance, and data intensive computing, data mining, and spatial computing. This will also train PhD students, including those at a Hispanic Serving Institution. Given the ever increasing size of datasets, exact nearest neighbor searches requiring a scan of the entire dataset quickly become impractical, leading to approximate nearest neighbor searches. Traditional methods, such as using trees, suffer from the constraints of dimensionality. Approximate similarity search is required for scalability in processing large numbers of queries, index construction over big spatial data, and to address the dynamic nature of data itself. This project will explore approximate similarity search algorithms based on product quantization and locality sensitive hashing (LSH) techniques for 10-100 billion scale datasets. It will result in (i) new methods for creating robust signatures of geometric data, based on comprehensive exploration of the performance/accuracy tradeoffs among different encoding schemes, informed by spatial properties of the data and requirements of relevant distance metrics, (ii) scalable coarse quantization techniques to hierarchically organize the polygonal datasets into neighborhoods by preserving hyperspace locality properties, leading to product quantization based scalable systems, and (iii) LSH-based techniques focusing on designing LSH functions for Jaccard distance.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
相似性搜索是数据挖掘中的一项重要任务。几何形状(多边形和轨迹)上的最近邻相似性搜索用于各种领域,例如数字病理学,太阳物理学和地理空间智能。在用于肿瘤诊断的数字病理学中,组织被表示为多边形,并且Jaccard距离(相交与联合的面积的比率)用于相似性比较。在预测太阳耀斑的太阳物理学中,查询对象和数据集由表示太阳事件的多边形组成。在地理空间智能中,相似性搜索用于在全局参考数据集中对形状或轮廓进行地理定位。目前的文献,而丰富的文本和图像数据集的方法,是缺乏几何数据集。该项目将在多边形和轨迹数据集上开发可扩展的相似性搜索系统。它将为研究界产生多边形查询和响应的基准数据集,并为采用相似性原语的数据挖掘技术提供信息。它将帮助介绍并行,分布式,高性能和数据密集型计算,数据挖掘和空间计算课程的学生项目。这也将培养博士生,包括那些在西班牙裔服务机构。由于数据集的大小不断增加,需要扫描整个数据集的精确最近邻搜索很快变得不切实际,导致近似最近邻搜索。传统的方法,如使用树,受到维数的限制。近似相似性搜索是处理大量查询的可伸缩性、大空间数据的索引构建以及解决数据本身的动态特性所必需的。该项目将探索基于乘积量化和局部敏感哈希(LSH)技术的近似相似性搜索算法,用于100 - 1000亿规模的数据集。它将导致(i)用于创建几何数据的鲁棒签名的新方法,该方法基于对不同编码方案之间的性能/准确性权衡的全面探索,由数据的空间属性和相关距离度量的要求通知,(ii)通过保持超空间局部性属性来分层地将多边形数据集组织成邻域的可扩展粗量化技术,该奖项反映了NSF的法定使命,并已被认为是值得通过使用基金会的智力价值和更广泛的影响审查标准进行评估的支持。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Sushil Prasad其他文献

Molecular docking studies of dihydropyridazin-3(2H)-one derivatives as Antifungal, antibacterial and anti-helmintic agents
二氢哒嗪-3(2H)-酮衍生物作为抗真菌、抗菌和抗蠕虫剂的分子对接研究
Comparative transcriptome analysis of bull X- and Y-spermatozoa
  • DOI:
    10.1038/s41598-025-99438-2
  • 发表时间:
    2025-04-26
  • 期刊:
  • 影响因子:
    3.900
  • 作者:
    Sofi Imran Ul Umar;Sushil Prasad;Soumen Naskar;Pranab Jyoti Das;Mridula Sharma;Arunava Pattanayak;Dhanu Kumar Murasing;Vijai Pal Bhadana;Sujay Rakshit
  • 通讯作者:
    Sujay Rakshit
Body weights and growth rates in indigenous chicken breeds of India
印度本土鸡品种的体重和生长率
  • DOI:
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    2.7
  • 作者:
    Manish K. Singh;Shive Kumar;S. Singh;R. K. Sharma;Anand Krishnan Prakash;Sushil Prasad;Yujuvendra Singh;Deep Narayan Singh
  • 通讯作者:
    Deep Narayan Singh
Numerical Analysis of Williamson-Micropolar Ternary Nanofluid Flow Through Porous Rotatory Surface
威廉姆森-微极性三元纳米流体穿过多孔旋转表面的数值分析
  • DOI:
    10.1166/jon.2023.2092
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    4.1
  • 作者:
    Diksha Sharma;Shilpa Sood;Archie Thakur;Sushil Prasad
  • 通讯作者:
    Sushil Prasad
Development and optimization of an efficient RNA isolation protocol from bovine (<em>Bos indicus</em>) spermatozoa
  • DOI:
    10.1016/j.bbrep.2024.101862
  • 发表时间:
    2024-12-01
  • 期刊:
  • 影响因子:
  • 作者:
    Sofi Imran Ul Umar;Sushil Prasad;Soumen Naskar;Pooja Chowdhury;Anju Rana;Pranab Jyoti Das
  • 通讯作者:
    Pranab Jyoti Das

Sushil Prasad的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Sushil Prasad', 18)}}的其他基金

Collaborative Research: CyberTraining:Implementation:Medium: Modern Course Exemplars infused with Parallel and Distributed Computing for the Introductory Computing Course Sequence
协作研究:网络培训:实施:中:为入门计算课程序列注入并行和分布式计算的现代课程范例
  • 批准号:
    2321015
  • 财政年份:
    2023
  • 资助金额:
    $ 36.5万
  • 项目类别:
    Standard Grant
Collaborative Research:CyberTraining:Implementation:Medium: Broadening Adoption of Parallel and Distributed Computing in Undergraduate Computer Science and Engineering Curricula
合作研究:网络培训:实施:中:在本科计算机科学与工程课程中扩大并行和分布式计算的采用
  • 批准号:
    2017590
  • 财政年份:
    2020
  • 资助金额:
    $ 36.5万
  • 项目类别:
    Standard Grant
Collaborative Research: CyberTraining: Conceptualization: Planning a Sustainable Ecosystem for Incorporating Parallel and Distributed Computing into Undergraduate Education
合作研究:网络培训:概念化:规划可持续生态系统,将并行和分布式计算纳入本科教育
  • 批准号:
    2002649
  • 财政年份:
    2019
  • 资助金额:
    $ 36.5万
  • 项目类别:
    Standard Grant
Collaborative Research: CyberTraining: Conceptualization: Planning a Sustainable Ecosystem for Incorporating Parallel and Distributed Computing into Undergraduate Education
合作研究:网络培训:概念化:规划可持续生态系统,将并行和分布式计算纳入本科教育
  • 批准号:
    1924272
  • 财政年份:
    2019
  • 资助金额:
    $ 36.5万
  • 项目类别:
    Standard Grant
Early Adopters of Curriculum Initiative in Parallel and Distributed Computing at EduPar-12
EduPar-12 并行和分布式计算课程计划的早期采用者
  • 批准号:
    1238003
  • 财政年份:
    2012
  • 资助金额:
    $ 36.5万
  • 项目类别:
    Standard Grant
A Curriculum Initiative on Parallel and Distributed Computing - Workshop on Parallel and Distributed Computing Education (EduPar-11) and Early Adopter Program
并行和分布式计算课程计划 - 并行和分布式计算教育研讨会 (EduPar-11) 和早期采用者计划
  • 批准号:
    1135124
  • 财政年份:
    2011
  • 资助金额:
    $ 36.5万
  • 项目类别:
    Standard Grant
NSF/TCPP Student Travel Awards for IPDPS-2011
IPDPS-2011 NSF/TCPP 学生旅行奖
  • 批准号:
    1138281
  • 财政年份:
    2011
  • 资助金额:
    $ 36.5万
  • 项目类别:
    Standard Grant
CiC: EAGER: CCollaborative: GIS Vector Data Overlay Processing on Azure Platform
CiC:EAGER:CCollaborative:Azure 平台上的 GIS 矢量数据叠加处理
  • 批准号:
    1048200
  • 财政年份:
    2010
  • 资助金额:
    $ 36.5万
  • 项目类别:
    Standard Grant
A Curriculum Initiative on Parallel and Distributed Computing - Toward Core Topics for Undergraduates
并行和分布式计算课程计划 - 面向本科生核心主题
  • 批准号:
    1048711
  • 财政年份:
    2010
  • 资助金额:
    $ 36.5万
  • 项目类别:
    Standard Grant
Technical Committee on Parallel Processing (TCPP) Student Travel Awards
并行处理技术委员会 (TCPP) 学生旅行奖
  • 批准号:
    1016907
  • 财政年份:
    2010
  • 资助金额:
    $ 36.5万
  • 项目类别:
    Standard Grant

相似国自然基金

Research on Quantum Field Theory without a Lagrangian Description
  • 批准号:
    24ZR1403900
  • 批准年份:
    2024
  • 资助金额:
    0.0 万元
  • 项目类别:
    省市级项目
Cell Research
  • 批准号:
    31224802
  • 批准年份:
    2012
  • 资助金额:
    24.0 万元
  • 项目类别:
    专项基金项目
Cell Research
  • 批准号:
    31024804
  • 批准年份:
    2010
  • 资助金额:
    24.0 万元
  • 项目类别:
    专项基金项目
Cell Research (细胞研究)
  • 批准号:
    30824808
  • 批准年份:
    2008
  • 资助金额:
    24.0 万元
  • 项目类别:
    专项基金项目
Research on the Rapid Growth Mechanism of KDP Crystal
  • 批准号:
    10774081
  • 批准年份:
    2007
  • 资助金额:
    45.0 万元
  • 项目类别:
    面上项目

相似海外基金

Collaborative Research: OAC Core: Distributed Graph Learning Cyberinfrastructure for Large-scale Spatiotemporal Prediction
合作研究:OAC Core:用于大规模时空预测的分布式图学习网络基础设施
  • 批准号:
    2403312
  • 财政年份:
    2024
  • 资助金额:
    $ 36.5万
  • 项目类别:
    Standard Grant
Collaborative Research: OAC CORE: Federated-Learning-Driven Traffic Event Management for Intelligent Transportation Systems
合作研究:OAC CORE:智能交通系统的联邦学习驱动的交通事件管理
  • 批准号:
    2414474
  • 财政年份:
    2024
  • 资助金额:
    $ 36.5万
  • 项目类别:
    Standard Grant
Collaborative Research: OAC Core: Large-Scale Spatial Machine Learning for 3D Surface Topology in Hydrological Applications
合作研究:OAC 核心:水文应用中 3D 表面拓扑的大规模空间机器学习
  • 批准号:
    2414185
  • 财政年份:
    2024
  • 资助金额:
    $ 36.5万
  • 项目类别:
    Standard Grant
Collaborative Research: OAC Core: Learning AI Surrogate of Large-Scale Spatiotemporal Simulations for Coastal Circulation
合作研究:OAC Core:学习沿海环流大规模时空模拟的人工智能替代品
  • 批准号:
    2402947
  • 财政年份:
    2024
  • 资助金额:
    $ 36.5万
  • 项目类别:
    Standard Grant
Collaborative Research: OAC Core: Distributed Graph Learning Cyberinfrastructure for Large-scale Spatiotemporal Prediction
合作研究:OAC Core:用于大规模时空预测的分布式图学习网络基础设施
  • 批准号:
    2403313
  • 财政年份:
    2024
  • 资助金额:
    $ 36.5万
  • 项目类别:
    Standard Grant
Collaborative Research: OAC Core: Learning AI Surrogate of Large-Scale Spatiotemporal Simulations for Coastal Circulation
合作研究:OAC Core:学习沿海环流大规模时空模拟的人工智能替代品
  • 批准号:
    2402946
  • 财政年份:
    2024
  • 资助金额:
    $ 36.5万
  • 项目类别:
    Standard Grant
Collaborative Research: OAC Core: CropDL - Scheduling and Checkpoint/Restart Support for Deep Learning Applications on HPC Clusters
合作研究:OAC 核心:CropDL - HPC 集群上深度学习应用的调度和检查点/重启支持
  • 批准号:
    2403088
  • 财政年份:
    2024
  • 资助金额:
    $ 36.5万
  • 项目类别:
    Standard Grant
Collaborative Research: OAC Core: CropDL - Scheduling and Checkpoint/Restart Support for Deep Learning Applications on HPC Clusters
合作研究:OAC 核心:CropDL - HPC 集群上深度学习应用的调度和检查点/重启支持
  • 批准号:
    2403090
  • 财政年份:
    2024
  • 资助金额:
    $ 36.5万
  • 项目类别:
    Standard Grant
Collaborative Research: OAC: Core: Harvesting Idle Resources Safely and Timely for Large-scale AI Applications in High-Performance Computing Systems
合作研究:OAC:核心:安全及时地收集闲置资源,用于高性能计算系统中的大规模人工智能应用
  • 批准号:
    2403399
  • 财政年份:
    2024
  • 资助金额:
    $ 36.5万
  • 项目类别:
    Standard Grant
Collaborative Research: OAC Core: CropDL - Scheduling and Checkpoint/Restart Support for Deep Learning Applications on HPC Clusters
合作研究:OAC 核心:CropDL - HPC 集群上深度学习应用的调度和检查点/重启支持
  • 批准号:
    2403089
  • 财政年份:
    2024
  • 资助金额:
    $ 36.5万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了