CCRI: ENS: Collaborative Research: Supporting and Sustaining Apache AsterixDB for the CISE Research Community

CCRI:ENS:协作研究:为 CISE 研究社区支持和维护 Apache AsterixDB

基本信息

  • 批准号:
    1925610
  • 负责人:
  • 金额:
    $ 114万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2019
  • 资助国家:
    美国
  • 起止时间:
    2019-09-01 至 2023-08-31
  • 项目状态:
    已结题

项目摘要

Mass quantities of digital information are being generated today on a daily basis through social networks, blogs, online communities, news sources, and mobile applications as well as our increasingly sensed surroundings. Tremendous insight can be gained by storing and making such big data available for exploration in a wide variety of domains. Beneficiaries include business, social sciences, public health, national security, political science, public safety, medicine, and government policy. Researchers exploring these benefits need software to store, index, manage, and analyze big data, while researchers investigating new technical approaches for managing and analyzing big data can benefit tremendously from the availability of shared building blocks to use as a foundation for their efforts. Over the past ten years the Apache AsterixDB scalable big data management system has been developed to address this need. Apache AsterixDB provides a repository for semi-structured data that cannot be organized in tables. In contrast to most other systems in its space, it supports a user-friendly query language which is more powerful than traditional database systems. In contrast to big data analytics offerings, it manages data and exploits knowledge of data layouts and indexes to process queries efficiently. AsterixDB is enjoying use for teaching and research on big data platforms, semi-structured data, and social data analytics. Based on user feedback, this project will enhance AsterixDB to better meet community needs, including improved text handling, numerous query processing improvements, additional standard-based geospatial data support, user-defined functions for user-provided logic, and a variety of storage-level improvements to increase the system's storage, indexing, data ingestion, and integration with other systems. The planned improvements will benefit the broader public by providing a general purpose foundation for extracting high-value insights from high-volume, low-value big data in areas such as public safety and health. In addition to enabling computer and information science and engineering research on big data management, Apache AsterixDB will train students nationwide in big data management and analysis; such training is crucial to addressing the information explosion due to social media, the mobile Web, and Internet of Things (IoT). Apache AsterixDB is a highly scalable big data Management System (BDMS) that stores, indexes, and manages semi-structured data, e.g., much like MongoDB, but it supports a full query language with the expressiveness of SQL and more. Unlike analytics engines such as Apache Hive or Spark, it stores and manages data, so it can use knowledge of data partitioning and index availability to avoid scanning data sets to process queries. Core features of the system include: a NoSQL-style data model based on extending JavaScript Object Notation (JSON); a declarative query language (SQL++) for semi-structured data; a query execution engine, Apache Hyracks, for partitioned-parallel query execution; partitioned data storage and indexing for efficient ingestion of new data; support for querying external data as well as data stored in AsterixDB; a rich set of data types, including spatial, temporal, and textual data; indexing via B+trees, R-trees, and inverted keyword indexes; and, transactional support akin to that of other NoSQL stores. AsterixDB began in 2009 as a large research project to combine the best ideas from the parallel database world, the Apache Hadoop world, and the semi-structured data world to create a next-generation BDMS; it was accepted into the Apache Software Foundation's incubator in February 2015, and it became a top-level Apache project in April 2016. AsterixDB has enjoyed use for teaching and research on big data platforms, semi-structured data, and social data analytics. Based on user feedback, we propose to enhance AsterixDB to better meet community needs by adding: (1) Improved text handling, including multiple tokenizers, stemming, and stop words. (2) Query optimization improvements, including statistics and cost-based decisions (e.g., join methods and index selection). (3) Query processing improvements, including dynamic range partitioning, fully parallel sorts, merge joins, and skew-handling. (4) Enriched, standardized spatial data support based on GeoJSON. (5) Support for user-defined functions in more languages, especially Python. (6) Support for parameterized queries and prepared statements. (7) Support for indexes on multi-valued fields. (8) Storage efficiency improvements. (9) Data ingestion improvements. (10) Additional formats for external data sets, including Parquet from Spark/Hive.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
今天,通过社交网络、博客、在线社区、新闻来源、移动应用程序以及我们日益感知的环境,每天都在产生大量的数字信息。通过存储和使这些大数据可用于各种领域的探索,可以获得巨大的洞察力。受益者包括商业、社会科学、公共卫生、国家安全、政治科学、公共安全、医学和政府政策。探索这些好处的研究人员需要软件来存储、索引、管理和分析大数据,而研究管理和分析大数据的新技术方法的研究人员可以从共享构建块的可用性中获得巨大收益,作为他们工作的基础。在过去的十年中,Apache AsterixDB可扩展大数据管理系统的开发就是为了满足这一需求。Apache AsterixDB为不能在表中组织的半结构化数据提供了一个存储库。与该领域的大多数其他系统相比,它支持比传统数据库系统更强大的用户友好查询语言。与大数据分析产品相比,它管理数据并利用数据布局和索引知识来有效地处理查询。AsterixDB被广泛用于大数据平台、半结构化数据和社交数据分析的教学和研究。根据用户反馈,该项目将增强AsterixDB以更好地满足社区需求,包括改进文本处理、大量查询处理改进、额外的基于标准的地理空间数据支持、用户提供逻辑的用户定义功能,以及各种存储级改进,以增加系统的存储、索引、数据摄取和与其他系统的集成。计划中的改进将为从公共安全和健康等领域的高容量、低价值大数据中提取高价值见解提供通用基础,从而使更广泛的公众受益。除了支持大数据管理的计算机和信息科学与工程研究外,Apache AsterixDB还将在全国范围内培训学生进行大数据管理和分析;这种培训对于应对社交媒体、移动网络和物联网(IoT)带来的信息爆炸至关重要。Apache AsterixDB是一个高度可扩展的大数据管理系统(BDMS),用于存储,索引和管理半结构化数据,例如,与MongoDB非常相似,但它支持具有SQL等表现力的完整查询语言。与Apache Hive或Spark等分析引擎不同,它存储和管理数据,因此它可以使用数据分区和索引可用性知识来避免扫描数据集来处理查询。该系统的核心功能包括:基于扩展JavaScript Object Notation (JSON)的nosql风格的数据模型;用于半结构化数据的声明性查询语言(SQL++);用于分区并行查询执行的查询执行引擎Apache Hyracks;分区数据存储和索引,有效地吸收新数据;支持查询外部数据以及存储在AsterixDB中的数据;一组丰富的数据类型,包括空间、时间和文本数据;通过B+树、r树和反向关键字索引进行索引;以及类似于其他NoSQL存储的事务支持。AsterixDB始于2009年,作为一个大型研究项目,它结合了并行数据库世界、Apache Hadoop世界和半结构化数据世界的最佳想法,创建了下一代BDMS;2015年2月进入Apache软件基金会孵化器,2016年4月成为Apache顶级项目。AsterixDB被广泛用于大数据平台、半结构化数据和社交数据分析的教学和研究。基于用户反馈,我们建议通过增加以下内容来增强AsterixDB以更好地满足社区需求:(1)改进文本处理,包括多个标记器、词干和停止词。(2)查询优化改进,包括统计和基于成本的决策(例如,连接方法和索引选择)。(3)查询处理的改进,包括动态范围分区、完全并行排序、合并连接和倾斜处理。(4)基于GeoJSON的丰富、规范的空间数据支持。(5)支持更多语言的用户定义函数,特别是Python。(6)支持参数化查询和预处理语句。(7)支持多值字段索引。(8)提高存储效率。(9)数据摄取改进。(10)外部数据集的额外格式,包括来自Spark/Hive的Parquet。该奖项反映了美国国家科学基金会的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。

项目成果

期刊论文数量(37)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Multi-valued indexing in Apache AsterixDB (SI DOLAP 2022)
Apache AsterixDB 中的多值索引 (SI DOLAP 2022)
  • DOI:
    10.1016/j.is.2022.102144
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    3.7
  • 作者:
    Galvizo, Glenn;Carey, Michael J.
  • 通讯作者:
    Carey, Michael J.
Breaking Down Memory Walls in LSM-based Storage Systems
打破基于 LSM 的存储系统中的内存墙
A brief introduction to geospatial big data analytics with apache AsterixDB
apache AsterixDB 地理空间大数据分析简介
  • DOI:
    10.1145/3486189.3490018
  • 发表时间:
    2021
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Sevim, Akil;Mahin, Mehnaz Tabassum;Vu, Tin;Maxon, Ian;Eldawy, Ahmed;Carey, Michael;Tsotras, Vassilis
  • 通讯作者:
    Tsotras, Vassilis
Benchmarking HOAP for Scalable Document Data Management: A First Step
可扩展文档数据管理的 HOAP 基准测试:第一步
CH2: A Hybrid Operational/Analytical Processing Benchmark for NoSQL
CH2:NoSQL 的混合操作/分析处理基准
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Michael Carey其他文献

Patients’ attitudes to bedside teaching after the COVID-19 pandemic
  • DOI:
    10.1007/s11845-023-03558-5
  • 发表时间:
    2023-11-02
  • 期刊:
  • 影响因子:
    1.600
  • 作者:
    Hayley Jackson;Claire MacBride;Laura Taylor;Michael Carey;Mary F. Higgins
  • 通讯作者:
    Mary F. Higgins
Undergraduate paramedic student competency assessment: A grounded theory study explaining how assessors in Australia and New Zealand determine student competency to practice
本科护理人员学生能力评估:一项扎根理论研究解释了澳大利亚和新西兰的评估人员如何确定学生的实践能力
  • DOI:
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Anthony C. Smith;P. Andersen;Michael Carey
  • 通讯作者:
    Michael Carey
Ultimate doctor liability: A myth of ignorance or myth of control?
  • DOI:
    10.1016/j.colegn.2009.06.003
  • 发表时间:
    2009-07-01
  • 期刊:
  • 影响因子:
  • 作者:
    Andrew Cashin;Michael Carey;Ngaire Watson;Greg Clark;Claire Newman;Cheryl D. Waters
  • 通讯作者:
    Cheryl D. Waters
Staff Attitudes Regarding Permanent Expulsionary Punishment (PEP) from Australian Government Schools: Comparing Queensland with Other Jurisdictions
澳大利亚公立学校员工对永久开除处罚 (PEP) 的态度:昆士兰州与其他司法管辖区的比较
  • DOI:
  • 发表时间:
    2021
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Brian Higgins;Michael Carey;Peter Dunn
  • 通讯作者:
    Peter Dunn
307. Prostate Targeted TSTA Oncolytic Adenovirus
  • DOI:
    10.1016/j.ymthe.2006.08.362
  • 发表时间:
    2006-01-01
  • 期刊:
  • 影响因子:
  • 作者:
    Makoto Sato;Steve Huyn;Russell Powell;Michael Carey;Sanjiv S. Gambhir;Lily Wu
  • 通讯作者:
    Lily Wu

Michael Carey的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Michael Carey', 18)}}的其他基金

III: Medium: Collaborative Research: Supporting High-Value Analytics on Big Low-Value Data
III:媒介:协作研究:支持低价值大数据的高价值分析
  • 批准号:
    1954962
  • 财政年份:
    2020
  • 资助金额:
    $ 114万
  • 项目类别:
    Standard Grant
BIGDATA: F: Collaborative Research: Optimizing Log-Structured-Merge-Based Big Data Management Systems
BIGDATA:F:协作研究:优化基于日志结构合并的大数据管理系统
  • 批准号:
    1838248
  • 财政年份:
    2019
  • 资助金额:
    $ 114万
  • 项目类别:
    Standard Grant
BIGDATA: F: DKM: Collaborative Research: Making Big Data Active: From Petabytes to Megafolks in Milliseconds
BIGDATA:F:DKM:协作研究:使大数据活跃起来:在毫秒内从 PB 级到百万级数据
  • 批准号:
    1447720
  • 财政年份:
    2014
  • 资助金额:
    $ 114万
  • 项目类别:
    Standard Grant
CI-ADDO-NEW: ASTERIX: A Community Software Platform for Big Data Research, Analysis, and Management
CI-ADDO-NEW:ASTERIX:用于大数据研究、分析和管理的社区软件平台
  • 批准号:
    1305430
  • 财政年份:
    2013
  • 资助金额:
    $ 114万
  • 项目类别:
    Standard Grant
DC: Large: Collaborative Research: ASTERIX: A Highly Scalable Parallel Platform for Semistructured Data Management and Analysis
DC:大型:协作研究:ASTERIX:用于半结构化数据管理和分析的高度可扩展并行平台
  • 批准号:
    0910989
  • 财政年份:
    2009
  • 资助金额:
    $ 114万
  • 项目类别:
    Standard Grant
Presidential Young Investigator Award (Computer and Information Science)
总统青年研究员奖(计算机与信息科学)
  • 批准号:
    8657323
  • 财政年份:
    1987
  • 资助金额:
    $ 114万
  • 项目类别:
    Continuing Grant
The Performance of Algorithms For Shared Relational DatabaseSystems
共享关系数据库系统算法的性能
  • 批准号:
    8402818
  • 财政年份:
    1984
  • 资助金额:
    $ 114万
  • 项目类别:
    Standard Grant

相似国自然基金

面向人机协同标注的临床命名实体识别的主动学习方法研究
  • 批准号:
    n/a
  • 批准年份:
    2023
  • 资助金额:
    0.0 万元
  • 项目类别:
    省市级项目
砷基高密度脂蛋白纳米药物重塑腺苷轴用于实体肿瘤多效协同治疗的研究
  • 批准号:
    32301188
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
携带新抗原和LAG-3Ig的T细胞增强TCR-T细胞功能并协同清除异质性实体瘤的机制研究
  • 批准号:
    82373261
  • 批准年份:
    2023
  • 资助金额:
    48 万元
  • 项目类别:
    面上项目
脂质纳米粒体内介导嵌合抗原受体-M1型巨噬细胞协同TLR激动剂治疗实体瘤的研究
  • 批准号:
    82304418
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
儿童实体肿瘤智能光免疫协同治疗机制研究
  • 批准号:
    n/a
  • 批准年份:
    2022
  • 资助金额:
    100.0 万元
  • 项目类别:
    省市级项目
政策文本数据的实体关系抽取方法及多领域政策协同演化研究
  • 批准号:
    72001191
  • 批准年份:
    2020
  • 资助金额:
    24.0 万元
  • 项目类别:
    青年科学基金项目
CAR-T细胞和组蛋白去乙酰化酶抑制剂TMP195协同抗击实体肿瘤的作用机理研究
  • 批准号:
  • 批准年份:
    2019
  • 资助金额:
    10.0 万元
  • 项目类别:
    省市级项目
辐照损伤和液态铅协同作用下铁晶界溶解腐蚀和脆化的模拟研究
  • 批准号:
    51671185
  • 批准年份:
    2016
  • 资助金额:
    60.0 万元
  • 项目类别:
    面上项目
融合知识图谱的文本个性化推荐机制研究
  • 批准号:
    61672100
  • 批准年份:
    2016
  • 资助金额:
    56.0 万元
  • 项目类别:
    面上项目

相似海外基金

Collaborative Research: Research Infrastructure: CCRI: ENS: Enhanced Open Networked Airborne Computing Platform
合作研究:研究基础设施:CCRI:ENS:增强型开放网络机载计算平台
  • 批准号:
    2235160
  • 财政年份:
    2023
  • 资助金额:
    $ 114万
  • 项目类别:
    Standard Grant
Collaborative Research: Research Infrastructure: CCRI: ENS: Enhanced Open Networked Airborne Computing Platform
合作研究:研究基础设施:CCRI:ENS:增强型开放网络机载计算平台
  • 批准号:
    2235157
  • 财政年份:
    2023
  • 资助金额:
    $ 114万
  • 项目类别:
    Standard Grant
Collaborative Research: Research Infrastructure: CCRI: ENS: Enhanced Open Networked Airborne Computing Platform
合作研究:研究基础设施:CCRI:ENS:增强型开放网络机载计算平台
  • 批准号:
    2235158
  • 财政年份:
    2023
  • 资助金额:
    $ 114万
  • 项目类别:
    Standard Grant
Collaborative Research: Research Infrastructure: CCRI: ENS: Enhanced Open Networked Airborne Computing Platform
合作研究:研究基础设施:CCRI:ENS:增强型开放网络机载计算平台
  • 批准号:
    2235159
  • 财政年份:
    2023
  • 资助金额:
    $ 114万
  • 项目类别:
    Standard Grant
Collaborative Research: CCRI: ENS: Boa 2.0: Enhancing Infrastructure for Studying Software and its Evolution at a Large Scale
合作研究:CCRI:ENS:Boa 2.0:增强大规模研究软件及其演化的基础设施
  • 批准号:
    2120448
  • 财政年份:
    2021
  • 资助金额:
    $ 114万
  • 项目类别:
    Standard Grant
Collaborative Research: CCRI: ENS: Boa 2.0: Enhancing Infrastructure for Studying Software and its Evolution at a Large Scale
合作研究:CCRI:ENS:Boa 2.0:增强大规模研究软件及其演化的基础设施
  • 批准号:
    2120386
  • 财政年份:
    2021
  • 资助金额:
    $ 114万
  • 项目类别:
    Standard Grant
Collaborative Research: CCRI: ENS: Boa 2.0: Enhancing Infrastructure for Studying Software and its Evolution at a Large Scale
合作研究:CCRI:ENS:Boa 2.0:增强大规模研究软件及其演化的基础设施
  • 批准号:
    2120345
  • 财政年份:
    2021
  • 资助金额:
    $ 114万
  • 项目类别:
    Standard Grant
CCRI: ENS: Collaborative Research: ns-3 Network Simulation for Next-Generation Wireless
CCRI:ENS:协作研究:下一代无线的 ns-3 网络仿真
  • 批准号:
    2016379
  • 财政年份:
    2020
  • 资助金额:
    $ 114万
  • 项目类别:
    Standard Grant
CCRI: ENS: Collaborative Research: ns-3 Network Simulation for Next-Generation Wireless
CCRI:ENS:协作研究:下一代无线的 ns-3 网络仿真
  • 批准号:
    2016381
  • 财政年份:
    2020
  • 资助金额:
    $ 114万
  • 项目类别:
    Standard Grant
CCRI: ENS: Collaborative Research: Enabling Automated Language Support for the srcML Infrastructure
CCRI:ENS:协作研究:为 srcML 基础设施提供自动化语言支持
  • 批准号:
    2016452
  • 财政年份:
    2020
  • 资助金额:
    $ 114万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了