III: Medium: Collaborative Research: Database-As-A-Service for Long Tail Science

III:媒介:合作研究:长尾科学的数据库即服务

基本信息

项目摘要

With tremendous amounts of data existing in scientific applications, database management becomes a critical issue, but database technology is not keeping pace. This problem is especially acute in the long tail of science: the large number of relatively small labs and individual researchers who collectively produce the majority of scientific results. These researchers lack the IT staff and specialized skills to deploy technology at scale, but have begun to routinely access hundreds of files and potentially terabytes of data to answer a scientific question. This project develops the architecture for a database-as-a-service platform for science. It explores techniques to automate the remaining barriers to use: ingesting data from native sources and automatically bootstrapping an initial set of queries and visualizations, in part by aggressively mining a shared corpus of data, queries, and user activity. It investigates methods to extract global knowledge and patterns while offering scientists access control over their data, and some formal privacy guarantees. The Intellectual Merit of this proposal consists of automating non-trivial cognitive tasks associated with data work: information extraction from unstructured data sources, data cleaning, logical schema design, privacy control, visualization, and application-building. As Broader Impacts, the project helps scientists reduce the proportion of time spent "handling data" rather than "doing science." All software resulting from this project are open source, and all findings are disseminated broadly through publications and workshops. Sustainable support for science users of the software is coordinated through the University of Washington eScience Institute. The research is incorporated in both undergraduate and graduate computer science courses, and the software is also incorporated into domain science courses as well. The project's outreach activities include advising students through special programs geared toward under-represented groups such as the CRA-W DREU. More information about this project is found at http://escience.washington.edu/dbaas.
随着海量数据在科学应用中的存在,数据库管理成为一个关键问题,但数据库技术却没有跟上步伐。这个问题在科学的长尾领域尤为尖锐:大量相对较小的实验室和个人研究人员共同产生了大多数科学成果。这些研究人员缺乏大规模部署技术的IT人员和专业技能,但已经开始例行公事地访问数百个文件和潜在的TB级数据来回答科学问题。该项目为科学开发了一个数据库即服务平台的体系结构。它探索了自动化剩余使用障碍的技术:从本机来源获取数据,并自动引导一组初始查询和可视化,部分是通过积极挖掘共享的数据、查询和用户活动语料库。它研究了提取全球知识和模式的方法,同时为科学家提供对他们的数据的访问控制,以及一些正式的隐私保障。这一建议的智力价值包括自动化与数据工作相关的非琐碎认知任务:从非结构化数据源提取信息、数据清理、逻辑模式设计、隐私控制、可视化和应用程序构建。作为更广泛的影响,该项目帮助科学家减少了花在“处理数据”而不是“做科学”上的时间比例。该项目产生的所有软件都是开放源码的,所有研究结果都通过出版物和研讨会广泛传播。通过华盛顿大学电子科学研究所协调对该软件的科学用户的可持续支持。这项研究既包括本科生的计算机科学课程,也包括研究生的计算机科学课程,软件也纳入领域科学课程。该项目的外展活动包括通过针对CRA-W Dreu等代表不足的群体的特殊项目为学生提供建议。有关该项目的更多信息,请访问http://escience.washington.edu/dbaas.

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Michael Cafarella其他文献

MDCR: A Dataset for Multi-Document Conditional Reasoning
MDCR:多文档条件推理数据集
  • DOI:
  • 发表时间:
    2024
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Peter Baile Chen;Yi Zhang;Chunwei Liu;Sejal Gupta;Yoon Kim;Michael Cafarella
  • 通讯作者:
    Michael Cafarella
Cackle: Analytical Workload Cost and Performance Stability With Elastic Pools
Cackle:使用弹性池分析工作负载成本和性能稳定性
  • DOI:
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Matthew Perron;Raul Castro Fernandez;David DeWitt;Michael Cafarella;Samuel Madden
  • 通讯作者:
    Samuel Madden
A Declarative System for Optimizing AI Workloads
用于优化人工智能工作负载的声明式系统
  • DOI:
  • 发表时间:
    2024
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Chunwei Liu;Matthew Russo;Michael Cafarella;Lei Cao;Peter Baille Chen;Zui Chen;Michael Franklin;T. Kraska;Samuel Madden;Gerardo Vitagliano
  • 通讯作者:
    Gerardo Vitagliano

Michael Cafarella的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Michael Cafarella', 18)}}的其他基金

A1: Knowledge Network Development Infrastructure with Application to COVID-19 Science and Economics
A1:应用于 COVID-19 科学和经济学的知识网络开发基础设施
  • 批准号:
    2132318
  • 财政年份:
    2021
  • 资助金额:
    $ 23.2万
  • 项目类别:
    Cooperative Agreement
RAPID: Rich and Accurate Auxiliary Databases for Supporting Virus Data Efforts
RAPID:丰富、准确的辅助数据库,支持病毒数据工作
  • 批准号:
    2029556
  • 财政年份:
    2020
  • 资助金额:
    $ 23.2万
  • 项目类别:
    Standard Grant
A1: Knowledge Network Development Infrastructure with Application to COVID-19 Science and Economics
A1:应用于 COVID-19 科学和经济学的知识网络开发基础设施
  • 批准号:
    2033558
  • 财政年份:
    2020
  • 资助金额:
    $ 23.2万
  • 项目类别:
    Cooperative Agreement
Convergence Accelerator Phase I (RAISE): Simultaneous Knowledge Network Programming and Extraction
融合加速器第一阶段(RAISE):同步知识网络编程和提取
  • 批准号:
    1936940
  • 财政年份:
    2019
  • 资助金额:
    $ 23.2万
  • 项目类别:
    Standard Grant
I-Corps: Explanation-Based Auditing: Improving the Security of Electronic Medical Records
I-Corps:基于解释的审计:提高电子病历的安全性
  • 批准号:
    1340372
  • 财政年份:
    2013
  • 资助金额:
    $ 23.2万
  • 项目类别:
    Standard Grant
CAREER: Building and Searching a Structured Web Database
职业:构建和搜索结构化 Web 数据库
  • 批准号:
    1054913
  • 财政年份:
    2011
  • 资助金额:
    $ 23.2万
  • 项目类别:
    Continuing Grant

相似海外基金

III : Medium: Collaborative Research: From Open Data to Open Data Curation
III:媒介:协作研究:从开放数据到开放数据管理
  • 批准号:
    2420691
  • 财政年份:
    2024
  • 资助金额:
    $ 23.2万
  • 项目类别:
    Standard Grant
Collaborative Research: III: Medium: Designing AI Systems with Steerable Long-Term Dynamics
合作研究:III:中:设计具有可操纵长期动态的人工智能系统
  • 批准号:
    2312865
  • 财政年份:
    2023
  • 资助金额:
    $ 23.2万
  • 项目类别:
    Standard Grant
Collaborative Research: III: MEDIUM: Responsible Design and Validation of Algorithmic Rankers
合作研究:III:媒介:算法排序器的负责任设计和验证
  • 批准号:
    2312932
  • 财政年份:
    2023
  • 资助金额:
    $ 23.2万
  • 项目类别:
    Standard Grant
III: Medium: Collaborative Research: Integrating Large-Scale Machine Learning and Edge Computing for Collaborative Autonomous Vehicles
III:媒介:协作研究:集成大规模机器学习和边缘计算以实现协作自动驾驶汽车
  • 批准号:
    2348169
  • 财政年份:
    2023
  • 资助金额:
    $ 23.2万
  • 项目类别:
    Continuing Grant
Collaborative Research: III: Medium: Algorithms for scalable inference and phylodynamic analysis of tumor haplotypes using low-coverage single cell sequencing data
合作研究:III:中:使用低覆盖率单细胞测序数据对肿瘤单倍型进行可扩展推理和系统动力学分析的算法
  • 批准号:
    2415562
  • 财政年份:
    2023
  • 资助金额:
    $ 23.2万
  • 项目类别:
    Standard Grant
Collaborative Research: III: Medium: New Machine Learning Empowered Nanoinformatics System for Advancing Nanomaterial Design
合作研究:III:媒介:新的机器学习赋能纳米信息学系统,促进纳米材料设计
  • 批准号:
    2347592
  • 财政年份:
    2023
  • 资助金额:
    $ 23.2万
  • 项目类别:
    Standard Grant
Collaborative Research: III: Medium: Knowledge discovery from highly heterogeneous, sparse and private data in biomedical informatics
合作研究:III:中:生物医学信息学中高度异构、稀疏和私有数据的知识发现
  • 批准号:
    2312862
  • 财政年份:
    2023
  • 资助金额:
    $ 23.2万
  • 项目类别:
    Standard Grant
Collaborative Research: III: MEDIUM: Responsible Design and Validation of Algorithmic Rankers
合作研究:III:媒介:算法排序器的负责任设计和验证
  • 批准号:
    2312930
  • 财政年份:
    2023
  • 资助金额:
    $ 23.2万
  • 项目类别:
    Standard Grant
Collaborative Research: III: Medium: VirtualLab: Integrating Deep Graph Learning and Causal Inference for Multi-Agent Dynamical Systems
协作研究:III:媒介:VirtualLab:集成多智能体动态系统的深度图学习和因果推理
  • 批准号:
    2312501
  • 财政年份:
    2023
  • 资助金额:
    $ 23.2万
  • 项目类别:
    Standard Grant
Collaborative Research: IIS: III: MEDIUM: Learning Protein-ish: Foundational Insight on Protein Language Models for Better Understanding, Democratized Access, and Discovery
协作研究:IIS:III:中等:学习蛋白质:对蛋白质语言模型的基础洞察,以更好地理解、民主化访问和发现
  • 批准号:
    2310113
  • 财政年份:
    2023
  • 资助金额:
    $ 23.2万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了