CRII: III: RUI: Adaptive Query Processing for Crowd-Powered Database Systems

CRII:III:RUI:众包数据库系统的自适应查询处理

基本信息

  • 批准号:
    1657259
  • 负责人:
  • 金额:
    $ 17.5万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2017
  • 资助国家:
    美国
  • 起止时间:
    2017-06-01 至 2020-05-31
  • 项目状态:
    已结题

项目摘要

Database systems provide users with the ability to ask questions, or queries, about collections of data that the system stores (e.g., find employees who had worked in the company for at least 2 years) and provide the answers very fast. People are better equipped than computers to tackle problems that require judgement or data interpretation due to their real-world experience and perception. A crowd-powered database system uses groups of people called "the crowd" to help with answering users' queries by recruiting them to process data using criteria that are subjective and/or require visual or semantic interpretation. For example, a user may want to find a set of faculty job postings in which the job description discusses a commitment to diversity and for which the school is in a safe location; interpreting each job description and researching crime statistics are tasks well-suited for people to perform. The system can coordinate crowd workers to process data more efficiently than the user alone could, which is advantageous when there are more than a handful of data items to process. While queries processed by the crowd may take hours or days to complete, crowd-powered database systems enable the processing of complex queries. For example, queries such as determining which research articles about a certain medical device contain experimental results comparing this and other devices, or finding out which of a set of jewelers only use ethically sourced metals and stones and also ship to Alaska. Database systems are designed to optimize the efficiency of query processing of individual users. A query often involves multiple parts, e.g., for the job postings query these are (1) filter out jobs that do not describe a commitment to diversity and (2) filter out jobs for schools in an unsafe location. A job that does not meet the first criterion does not need to be processed for the second one, and vice versa. The processing order for the parts of the query influences how much computation is needed and how long the query will take to process. Traditional database systems have information about how long parts of a query will take and the likelihood of items satisfying filters; they use this information to choose an efficient processing ordering for a query. However, this information is not known for crowd-powered database systems. The usefulness of optimizers for crowd-powered database systems hinges on their ability to find an efficient way to process a user's query when this information is unknown before processing the query. The aim of this research project is to tackle this challenge by developing a system to process queries involving multiple filtering criteria that observes the execution environment and adjusts its processing strategy as the query executes. This project will have broad impact by yielding a query processing system that will empower users to ask more interesting questions about data, advancing research in allocating human computation resources in dynamic environments, as well as training a group of undergraduate students both in research and in the principles of systems design.The goal of this research is to build a cost-based query optimizer for crowd-powered filter queries for which important statistics used in optimization are unknown at query time. These statistics include traditional metrics such as filter selectivity as well as new contributors to query cost such as the time it takes crowd workers to complete a unit of work and the number of workers needed to reach consensus for a subjective evaluation. The project takes an adaptive approach to query processing: while the query is running, the system observes cost and selectivity information and periodically reorders the query plan operators to reduce overall query cost. The researchers will demonstrate that their query optimizer yields query costs that are comparable to costs from the optimal crowd-based query plan for which selectivity and subjectivity information is known a priori. Source code, papers, and presentations are available on the project web site (https://www.cs.hmc.edu/~beth/adaptivecrowd.shtml).
数据库系统使用户能够对系统存储的数据集合提出问题或查询(例如,查找在公司工作至少 2 年的员工)并快速提供答案。由于现实世界的经验和感知,人们比计算机更有能力解决需要判断或数据解释的问题。人群驱动的数据库系统使用称为“人群”的人群来帮助回答用户的查询,方法是招募用户使用主观和/或需要视觉或语义解释的标准来处理数据。例如,用户可能想要查找一组教师职位招聘信息,其中的职位描述讨论了对多样性的承诺,并且学校位于安全的位置;解释每项工作描述和研究犯罪统计数据是非常适合人们执行的任务。该系统可以协调众包工作人员比用户单独更有效地处理数据,这在有多个数据项需要处理时是有利的。虽然群体处理的查询可能需要数小时或数天才能完成,但群体驱动的数据库系统可以处理复杂的查询。例如,诸如确定哪些关于某种医疗设备的研究文章包含将该设备与其他设备进行比较的实验结果等查询,或者找出一组珠宝商中的哪一个仅使用符合道德来源的金属和宝石,并且还运送到阿拉斯加。数据库系统旨在优化单个用户的查询处理效率。查询通常涉及多个部分,例如,对于职位发布查询,这些部分是(1)过滤掉不描述多样性承诺的职位,以及(2)过滤掉不安全地点的学校的职位。不满足第一个标准的作业不需要处理第二个标准,反之亦然。查询各部分的处理顺序会影响需要的计算量以及查询处理所需的时间。传统数据库系统具有有关查询部分需要多长时间以及项目满足过滤器的可能性的信息;他们使用此信息来为查询选择有效的处理顺序。然而,众包数据库系统并不知道这些信息。众包数据库系统优化器的有用性取决于它们在处理查询之前未知信息时找到处理用户查询的有效方法的能力。该研究项目的目的是通过开发一个系统来处理涉及多个过滤标准的查询,该系统观察执行环境并在查询执行时调整其处理策略,从而应对这一挑战。该项目将产生广泛的影响,因为它会产生一个查询处理系统,使用户能够提出有关数据的更有趣的问题,推进在动态环境中分配人类计算资源的研究,以及在研究和系统设计原理方面培训一组本科生。这项研究的目标是为众包过滤查询构建一个基于成本的查询优化器,优化中使用的重要统计数据在查询时是未知的。这些统计数据包括传统指标(例如过滤器选择性)以及查询成本的新贡献者(例如众包工作人员完成一个工作单元所需的时间以及为主观评估达成共识所需的工作人员数量)。该项目采用自适应方法进行查询处理:在查询运行时,系统观察成本和选择性信息,并定期对查询计划运算符重新排序,以降低总体查询成本。研究人员将证明他们的查询优化器产生的查询成本与基于人群的最佳查询计划的成本相当,其中选择性和主观性信息是先验已知的。源代码、论文和演示文稿可在项目网站 (https://www.cs.hmc.edu/~beth/adaptivecrowd.shtml) 上获取。

项目成果

期刊论文数量(1)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Dynamic Filter: Adaptive Query Processing with the Crowd
动态过滤器:群体的自适应查询处理
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Katherine Trushkowsky其他文献

Katherine Trushkowsky的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

相似国自然基金

全钒液流电池负极V(II)/V(III)电化学氧化还原的催化机理研究
  • 批准号:
    2025JJ50094
  • 批准年份:
    2025
  • 资助金额:
    0.0 万元
  • 项目类别:
    省市级项目
硅基III-V族亚微米线激光器的光场模式调控与耦合机理研究
  • 批准号:
    JCZRQN202501004
  • 批准年份:
    2025
  • 资助金额:
    0.0 万元
  • 项目类别:
    省市级项目
吡咯烷生物碱所致肝窦阻塞综合征III区肝损伤的新机制——局部氨代谢紊乱
  • 批准号:
    JCZRYB202500652
  • 批准年份:
    2025
  • 资助金额:
    0.0 万元
  • 项目类别:
    省市级项目
MXene/nZVI@FH材料微域层界面调控水中砷(III)氧化迁移机制
  • 批准号:
    2025JJ50319
  • 批准年份:
    2025
  • 资助金额:
    0.0 万元
  • 项目类别:
    省市级项目
HOXC8/OPN/CD44/EGFR轴介导的奥沙利铂耐药性在III期右半结肠癌耐药进展中的研究
  • 批准号:
    2025JJ50694
  • 批准年份:
    2025
  • 资助金额:
    0.0 万元
  • 项目类别:
    省市级项目
AI结合超声原始射频信号评估Bethesda III/IV类甲状腺肿瘤包膜和血管侵犯研究
  • 批准号:
  • 批准年份:
    2025
  • 资助金额:
    0.0 万元
  • 项目类别:
    省市级项目
硫化砷靶向VPS4B-ESCRT-III调控自噬溶酶体通路逆转三阴性乳腺癌顺铂耐药性的研究
  • 批准号:
  • 批准年份:
    2025
  • 资助金额:
    0.0 万元
  • 项目类别:
    省市级项目
ASPGR与MRC2双受体介导铱(III)配合物 脂质体抗肝肿瘤研究
  • 批准号:
  • 批准年份:
    2025
  • 资助金额:
    10.0 万元
  • 项目类别:
    省市级项目
Ap-Exo III 联合模式识别构建降尿酸药 物筛选新方法的研究
  • 批准号:
  • 批准年份:
    2025
  • 资助金额:
    10.0 万元
  • 项目类别:
    省市级项目
经关节突截骨矫治III期Kummell病临床有效性分析
  • 批准号:
  • 批准年份:
    2025
  • 资助金额:
    0.0 万元
  • 项目类别:
    省市级项目

相似海外基金

III: Small: RUI: Designing Structure-Phenotype Query-Retrieval and Analysis Systems for Microscopy-Based Whole Organism Studies
III:小:RUI:为基于显微镜的整个生物体研究设计结构表型查询检索和分析系统
  • 批准号:
    2401096
  • 财政年份:
    2023
  • 资助金额:
    $ 17.5万
  • 项目类别:
    Standard Grant
III: Small: RUI: A Fairness Auditing Framework for Predictive Mobility Models
III:小:RUI:预测移动模型的公平性审核框架
  • 批准号:
    2304213
  • 财政年份:
    2023
  • 资助金额:
    $ 17.5万
  • 项目类别:
    Standard Grant
III: Small: RUI: Finding Best Representative Phylogenetic Tree Reconciliations
III:小:RUI:寻找最佳代表性系统发育树协调
  • 批准号:
    2231150
  • 财政年份:
    2022
  • 资助金额:
    $ 17.5万
  • 项目类别:
    Standard Grant
CRII: III: RUI: Multiphysics Modeling of Slope Stability in Post-Wildfire Environment
CRII:III:RUI:野火后环境中边坡稳定性的多物理场建模
  • 批准号:
    2153370
  • 财政年份:
    2022
  • 资助金额:
    $ 17.5万
  • 项目类别:
    Standard Grant
III: Small: RUI: Collaborative Research: Modeling Pre- and Post- Conditions for Understanding Events
III:小:RUI:协作研究:为理解事件建模前后条件
  • 批准号:
    2007128
  • 财政年份:
    2020
  • 资助金额:
    $ 17.5万
  • 项目类别:
    Interagency Agreement
III: Small: RUI: Investigating Fragmentation Rules and Improving Metabolite Identification Using Graph Grammar and Statistical Methods
III:小:RUI:使用图语法和统计方法研究断裂规则并改进代谢物识别
  • 批准号:
    2053286
  • 财政年份:
    2020
  • 资助金额:
    $ 17.5万
  • 项目类别:
    Standard Grant
EAGER: III: Collaborative Research: RUI: In silico Algorithm for Assessing the Effects of Amino Acid Insertion and Deletion Mutations
EAGER:III:合作研究:RUI:用于评估氨基酸插入和缺失突变影响的计算机算法
  • 批准号:
    2031283
  • 财政年份:
    2020
  • 资助金额:
    $ 17.5万
  • 项目类别:
    Standard Grant
III: Small: RUI: Scalable and Iterative Statistical Testing of Multiple Hypotheses on Massive Datasets
III:小型:RUI:海量数据集上多个假设的可扩展和迭代统计检验
  • 批准号:
    2006765
  • 财政年份:
    2020
  • 资助金额:
    $ 17.5万
  • 项目类别:
    Standard Grant
CRII: III: RUI: Association Testing and Inversion Detection without Reference Genomes
CRII:III:RUI:无参考基因组的关联测试和倒置检测
  • 批准号:
    1947257
  • 财政年份:
    2020
  • 资助金额:
    $ 17.5万
  • 项目类别:
    Standard Grant
CRII: III: RUI: Effective Protein Characterization via Fast Exact Open Modification Searching
CRII:III:RUI:通过快速精确开放修饰搜索进行有效的蛋白质表征
  • 批准号:
    1850557
  • 财政年份:
    2019
  • 资助金额:
    $ 17.5万
  • 项目类别:
    Continuing Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了