CCRI: ENS: Collaborative Research: Supporting and Sustaining Apache AsterixDB for the CISE Research Community
CCRI:ENS:协作研究:为 CISE 研究社区支持和维护 Apache AsterixDB
基本信息
- 批准号:1924694
- 负责人:
- 金额:$ 86万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2019
- 资助国家:美国
- 起止时间:2019-09-01 至 2024-08-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Mass quantities of digital information are being generated today on a daily basis through social networks, blogs, online communities, news sources, and mobile applications as well as our increasingly sensed surroundings. Tremendous insight can be gained by storing and making such big data available for exploration in a wide variety of domains. Beneficiaries include business, social sciences, public health, national security, political science, public safety, medicine, and government policy. Researchers exploring these benefits need software to store, index, manage, and analyze big data, while researchers investigating new technical approaches for managing and analyzing big data can benefit tremendously from the availability of shared building blocks to use as a foundation for their efforts. Over the past ten years the Apache AsterixDB scalable big data management system has been developed to address this need. Apache AsterixDB provides a repository for semi-structured data that cannot be organized in tables. In contrast to most other systems in its space, it supports a user-friendly query language which is more powerful than traditional database systems. In contrast to big data analytics offerings, it manages data and exploits knowledge of data layouts and indexes to process queries efficiently. AsterixDB is enjoying use for teaching and research on big data platforms, semi-structured data, and social data analytics. Based on user feedback, this project will enhance AsterixDB to better meet community needs, including improved text handling, numerous query processing improvements, additional standard-based geospatial data support, user-defined functions for user-provided logic, and a variety of storage-level improvements to increase the system's storage, indexing, data ingestion, and integration with other systems. The planned improvements will benefit the broader public by providing a general purpose foundation for extracting high-value insights from high-volume, low-value big data in areas such as public safety and health. In addition to enabling computer and information science and engineering research on big data management, Apache AsterixDB will train students nationwide in big data management and analysis; such training is crucial to addressing the information explosion due to social media, the mobile Web, and Internet of Things (IoT). Apache AsterixDB is a highly scalable big data Management System (BDMS) that stores, indexes, and manages semi-structured data, e.g., much like MongoDB, but it supports a full query language with the expressiveness of SQL and more. Unlike analytics engines such as Apache Hive or Spark, it stores and manages data, so it can use knowledge of data partitioning and index availability to avoid scanning data sets to process queries. Core features of the system include: a NoSQL-style data model based on extending JavaScript Object Notation (JSON); a declarative query language (SQL++) for semi-structured data; a query execution engine, Apache Hyracks, for partitioned-parallel query execution; partitioned data storage and indexing for efficient ingestion of new data; support for querying external data as well as data stored in AsterixDB; a rich set of data types, including spatial, temporal, and textual data; indexing via B+trees, R-trees, and inverted keyword indexes; and, transactional support akin to that of other NoSQL stores. AsterixDB began in 2009 as a large research project to combine the best ideas from the parallel database world, the Apache Hadoop world, and the semi-structured data world to create a next-generation BDMS; it was accepted into the Apache Software Foundation's incubator in February 2015, and it became a top-level Apache project in April 2016. AsterixDB has enjoyed use for teaching and research on big data platforms, semi-structured data, and social data analytics. Based on user feedback, we propose to enhance AsterixDB to better meet community needs by adding: (1) Improved text handling, including multiple tokenizers, stemming, and stop words. (2) Query optimization improvements, including statistics and cost-based decisions (e.g., join methods and index selection). (3) Query processing improvements, including dynamic range partitioning, fully parallel sorts, merge joins, and skew-handling. (4) Enriched, standardized spatial data support based on GeoJSON. (5) Support for user-defined functions in more languages, especially Python. (6) Support for parameterized queries and prepared statements. (7) Support for indexes on multi-valued fields. (8) Storage efficiency improvements. (9) Data ingestion improvements. (10) Additional formats for external data sets, including Parquet from Spark/Hive.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
如今,每天都通过社交网络、博客、在线社区、新闻源和移动的应用以及我们日益感知的环境生成大量数字信息。通过存储和提供这些大数据,可以在各种领域进行探索,从而获得巨大的洞察力。受益者包括商业,社会科学,公共卫生,国家安全,政治学,公共安全,医学和政府政策。探索这些好处的研究人员需要软件来存储、索引、管理和分析大数据,而研究管理和分析大数据的新技术方法的研究人员可以从共享构建块的可用性中受益匪浅,以用作他们工作的基础。在过去的十年中,Apache AsterixDB可扩展的大数据管理系统已经开发出来,以满足这一需求。ApacheAsterixDB为不能用表组织的半结构化数据提供了一个存储库。与其空间中的大多数其他系统相比,它支持用户友好的查询语言,比传统的数据库系统更强大。与大数据分析产品相比,它管理数据并利用数据布局和索引的知识来有效地处理查询。AsterixDB正被用于大数据平台、半结构化数据和社交数据分析的教学和研究。根据用户反馈,该项目将增强AsterixDB,以更好地满足社区需求,包括改进文本处理,许多查询处理改进,额外的基于标准的地理空间数据支持,用户自定义功能,用户提供的逻辑,以及各种存储级别的改进,以增加系统的存储,索引,数据摄取和与其他系统的集成。计划中的改进将为从公共安全和健康等领域的大量低价值大数据中提取高价值见解提供通用基础,从而使更广泛的公众受益。除了支持计算机和信息科学以及大数据管理的工程研究外,Apache AsterixDB还将在全国范围内对学生进行大数据管理和分析方面的培训;这种培训对于解决社交媒体、移动的Web和物联网(IoT)带来的信息爆炸至关重要。 Apache AsterixDB是一个高度可扩展的大数据管理系统(BDMS),用于存储,索引和管理半结构化数据,例如,很像MongoDB,但它支持具有SQL等表现力的完整查询语言。与Apache Hive或Spark等分析引擎不同,它存储和管理数据,因此可以使用数据分区和索引可用性的知识来避免扫描数据集来处理查询。该系统的核心功能包括:基于扩展JavaScript Object Notation(JSON)的NoSQL风格数据模型;用于半结构化数据的声明性查询语言(SQL++);用于分区并行查询执行的查询执行引擎Apache Hyracks;用于有效摄取新数据的分区数据存储和索引;支持查询外部数据以及存储在AsterixDB中的数据;一组丰富的数据类型,包括空间、时间和文本数据;通过B+树、R树和反向关键字索引进行索引;以及类似于其他NoSQL存储的事务支持。AsterixDB始于2009年,是一个大型研究项目,旨在联合收割机并行数据库世界、Apache Hadoop世界和半结构化数据世界的最佳想法,创建下一代BDMS;它于2015年2月被Apache软件基金会的孵化器接受,并于2016年4月成为顶级Apache项目。 AsterixDB已被用于大数据平台,半结构化数据和社交数据分析的教学和研究。根据用户反馈,我们建议通过添加以下内容来增强AsterixDB,以更好地满足社区需求:(1)改进文本处理,包括多个标记器,词干和停止词。(2)查询优化改进,包括统计和基于成本的决策(例如,连接方法和索引选择)。(3)查询处理改进,包括动态范围分区、完全并行排序、合并联接和倾斜处理。(4)基于GeoJSON的丰富的标准化空间数据支持。(5)支持更多语言中的用户定义函数,尤其是Python。(6)支持参数化查询和预准备语句。(7)支持多值字段上的索引。(8)提高存储效率。(9)数据摄取改进。(10)外部数据集的附加格式,包括Spark/Hive的Parquet。该奖项反映了NSF的法定使命,并被认为值得通过使用基金会的知识价值和更广泛的影响审查标准进行评估来支持。
项目成果
期刊论文数量(18)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Evaluating computational geometry libraries for big spatial data exploration
评估大空间数据探索的计算几何库
- DOI:10.1145/3403896.3403969
- 发表时间:2020
- 期刊:
- 影响因子:0
- 作者:Zhang, Yaming;Eldawy, Ahmed
- 通讯作者:Eldawy, Ahmed
A brief introduction to geospatial big data analytics with apache AsterixDB
apache AsterixDB 地理空间大数据分析简介
- DOI:10.1145/3486189.3490018
- 发表时间:2021
- 期刊:
- 影响因子:0
- 作者:Sevim, Akil;Mahin, Mehnaz Tabassum;Vu, Tin;Maxon, Ian;Eldawy, Ahmed;Carey, Michael;Tsotras, Vassilis
- 通讯作者:Tsotras, Vassilis
SGPAC: Generalized Scalable Spatial GroupBy Aggregations over Complex Polygons
SGPAC:通过复杂多边形聚合的广义可扩展空间组
- DOI:10.1007/s10707-023-00491-8
- 发表时间:2023
- 期刊:
- 影响因子:2
- 作者:Abdelhafeez, Laila;Magdy, Amr;Tsotras, Vassilis J.
- 通讯作者:Tsotras, Vassilis J.
Less is More: How Fewer Results Improve Progressive Join Query Processing
少即是多:更少的结果如何改进渐进式连接查询处理
- DOI:10.1145/3603719.3603728
- 发表时间:2023
- 期刊:
- 影响因子:0
- 作者:Zhang, Xin;Eldawy, Ahmed
- 通讯作者:Eldawy, Ahmed
Revisiting Runtime Dynamic Optimization for Join Queries in Big Data Management Systems
重新审视大数据管理系统中连接查询的运行时动态优化
- DOI:10.1145/3604437.3604460
- 发表时间:2023
- 期刊:
- 影响因子:0
- 作者:Pavlopoulou, Christina;Carey, Michael J.;Tsotras, Vassilis J.
- 通讯作者:Tsotras, Vassilis J.
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Vassilis Tsotras其他文献
Vassilis Tsotras的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Vassilis Tsotras', 18)}}的其他基金
III: Small: Discovering Hidden Semantics from Spatio-temporal Sensed Data
III:小:从时空感知数据中发现隐藏语义
- 批准号:
1527984 - 财政年份:2015
- 资助金额:
$ 86万 - 项目类别:
Standard Grant
BIGDATA: F: DKM: Collaborative Research: Making Big Data Active: From Petabytes to Megafolks in Milliseconds
BIGDATA:F:DKM:协作研究:使大数据活跃起来:在毫秒内从 PB 级到百万级数据
- 批准号:
1447826 - 财政年份:2014
- 资助金额:
$ 86万 - 项目类别:
Standard Grant
CI-ADDO-NEW: ASTERIX: A Community Software Platform for Big Data Research, Analysis, and Management
CI-ADDO-NEW:ASTERIX:用于大数据研究、分析和管理的社区软件平台
- 批准号:
1305253 - 财政年份:2013
- 资助金额:
$ 86万 - 项目类别:
Standard Grant
III: EAGER: Accelerated Filtering of Spatiotemporal Archives Using Reconfigurable Hardware
III:EAGER:使用可重构硬件加速时空档案过滤
- 批准号:
1144158 - 财政年份:2011
- 资助金额:
$ 86万 - 项目类别:
Standard Grant
III: Travel Support for U.S.-Based Graduate Students to Attend the 26th IEEE International Conference on Data Engineering (ICDE 2010)
III:为美国研究生参加第 26 届 IEEE 国际数据工程会议 (ICDE 2010) 提供差旅支持
- 批准号:
0956600 - 财政年份:2009
- 资助金额:
$ 86万 - 项目类别:
Standard Grant
DC: Large: Collaborative Research: ASTERIX: A Highly Scalable Parallel Platform for Semistructured Data Management and Analysis
DC:大型:协作研究:ASTERIX:用于半结构化数据管理和分析的高度可扩展并行平台
- 批准号:
0910859 - 财政年份:2009
- 资助金额:
$ 86万 - 项目类别:
Standard Grant
III-COR: Collaborative Research: Graceful Evolution and Historical Queries in Information Systems -- a Unified Approach
III-COR:协作研究:信息系统中的优雅进化和历史查询——统一方法
- 批准号:
0705916 - 财政年份:2007
- 资助金额:
$ 86万 - 项目类别:
Standard Grant
NeTS-NOSS: Providing Flash Memory Support for Sensor Network Architectures
NeTS-NOSS:为传感器网络架构提供闪存支持
- 批准号:
0627191 - 财政年份:2006
- 资助金额:
$ 86万 - 项目类别:
Standard Grant
Query Processing Over GIS Objects With Functional Attributes
具有功能属性的 GIS 对象的查询处理
- 批准号:
0534781 - 财政年份:2006
- 资助金额:
$ 86万 - 项目类别:
Continuing Grant
SGER Collaborative Research: Support for Design of Evolving Information Systems
SGER 协作研究:支持不断发展的信息系统设计
- 批准号:
0339032 - 财政年份:2003
- 资助金额:
$ 86万 - 项目类别:
Standard Grant
相似国自然基金
基于色氨酸代谢调控ENS途径探讨电针治疗功能性消化不良的作用机制
- 批准号:JCZRLH202500075
- 批准年份:2025
- 资助金额:0.0 万元
- 项目类别:省市级项目
基于GDNF/PI3K/AKT信号通路探讨白术七物颗粒调控ENS-ICC-SMC网络治 疗气阴两虚型STC的机制研究
- 批准号:2025JJ90111
- 批准年份:2025
- 资助金额:0.0 万元
- 项目类别:省市级项目
水稻EnS150基因调控种子休眠和萌发的分子机制研究
- 批准号:32301853
- 批准年份:2023
- 资助金额:30.00 万元
- 项目类别:青年科学基金项目
岩藻糖基化修饰的MSCs介导GDNF正反馈调控肠神经元焦亡及ENPC自噬促进ENS重建
- 批准号:n/a
- 批准年份:2023
- 资助金额:0.0 万元
- 项目类别:省市级项目
生孢梭菌通过“IPA-AHR-mTOR”轴调控ENPC自噬参与糖尿病ENS重建的机制研究
- 批准号:82300616
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
基于肠道菌群/5-HT/ENS调控的番茄红素改善肠动力作用机制研究
- 批准号:
- 批准年份:2021
- 资助金额:30 万元
- 项目类别:青年科学基金项目
MSCs胞外囊泡调控ENPC的SETD2/H3K36轴在糖尿病ENS重建中的作用及机制研究
- 批准号:
- 批准年份:2021
- 资助金额:30 万元
- 项目类别:青年科学基金项目
基于lncRNA Ens6探讨天南星活性成分抑制线粒体分裂促进M2小胶质细胞极化改善缺血性脑卒中的作用机制研究
- 批准号:82003976
- 批准年份:2020
- 资助金额:24.0 万元
- 项目类别:青年科学基金项目
岩藻糖基化在MSCs介导的ENS重建中的作用及机制研究
- 批准号:81974068
- 批准年份:2019
- 资助金额:55.0 万元
- 项目类别:面上项目
从肌层巨噬细胞MM和ENS的Cross-talk 探讨广藿香活性成分对IBS-D肠神经稳态的调节机制
- 批准号:81973586
- 批准年份:2019
- 资助金额:55.0 万元
- 项目类别:面上项目
相似海外基金
Collaborative Research: Research Infrastructure: CCRI: ENS: Enhanced Open Networked Airborne Computing Platform
合作研究:研究基础设施:CCRI:ENS:增强型开放网络机载计算平台
- 批准号:
2235160 - 财政年份:2023
- 资助金额:
$ 86万 - 项目类别:
Standard Grant
Collaborative Research: Research Infrastructure: CCRI: ENS: Enhanced Open Networked Airborne Computing Platform
合作研究:研究基础设施:CCRI:ENS:增强型开放网络机载计算平台
- 批准号:
2235157 - 财政年份:2023
- 资助金额:
$ 86万 - 项目类别:
Standard Grant
Collaborative Research: Research Infrastructure: CCRI: ENS: Enhanced Open Networked Airborne Computing Platform
合作研究:研究基础设施:CCRI:ENS:增强型开放网络机载计算平台
- 批准号:
2235158 - 财政年份:2023
- 资助金额:
$ 86万 - 项目类别:
Standard Grant
Collaborative Research: Research Infrastructure: CCRI: ENS: Enhanced Open Networked Airborne Computing Platform
合作研究:研究基础设施:CCRI:ENS:增强型开放网络机载计算平台
- 批准号:
2235159 - 财政年份:2023
- 资助金额:
$ 86万 - 项目类别:
Standard Grant
Collaborative Research: CCRI: ENS: Boa 2.0: Enhancing Infrastructure for Studying Software and its Evolution at a Large Scale
合作研究:CCRI:ENS:Boa 2.0:增强大规模研究软件及其演化的基础设施
- 批准号:
2120448 - 财政年份:2021
- 资助金额:
$ 86万 - 项目类别:
Standard Grant
Collaborative Research: CCRI: ENS: Boa 2.0: Enhancing Infrastructure for Studying Software and its Evolution at a Large Scale
合作研究:CCRI:ENS:Boa 2.0:增强大规模研究软件及其演化的基础设施
- 批准号:
2120386 - 财政年份:2021
- 资助金额:
$ 86万 - 项目类别:
Standard Grant
Collaborative Research: CCRI: ENS: Boa 2.0: Enhancing Infrastructure for Studying Software and its Evolution at a Large Scale
合作研究:CCRI:ENS:Boa 2.0:增强大规模研究软件及其演化的基础设施
- 批准号:
2120345 - 财政年份:2021
- 资助金额:
$ 86万 - 项目类别:
Standard Grant
CCRI: ENS: Collaborative Research: ns-3 Network Simulation for Next-Generation Wireless
CCRI:ENS:协作研究:下一代无线的 ns-3 网络仿真
- 批准号:
2016379 - 财政年份:2020
- 资助金额:
$ 86万 - 项目类别:
Standard Grant
CCRI: ENS: Collaborative Research: ns-3 Network Simulation for Next-Generation Wireless
CCRI:ENS:协作研究:下一代无线的 ns-3 网络仿真
- 批准号:
2016381 - 财政年份:2020
- 资助金额:
$ 86万 - 项目类别:
Standard Grant
CCRI: ENS: Collaborative Research: Enabling Automated Language Support for the srcML Infrastructure
CCRI:ENS:协作研究:为 srcML 基础设施提供自动化语言支持
- 批准号:
2016452 - 财政年份:2020
- 资助金额:
$ 86万 - 项目类别:
Standard Grant