III: Medium: Collaborative Research: DataHub - A Collaborative Dataset Management Platform for Data Science

III:媒介:协作研究:DataHub - 数据科学协作数据集管理平台

基本信息

  • 批准号:
    1513443
  • 负责人:
  • 金额:
    $ 33.33万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Continuing Grant
  • 财政年份:
    2015
  • 资助国家:
    美国
  • 起止时间:
    2015-09-01 至 2019-08-31
  • 项目状态:
    已结题

项目摘要

The rise of the Internet, smart phones, and wireless sensors has resulted in a vast trove of data about all aspects of our lives, from our social interactions to our personal preferences to our vital signs and medical records. Increasingly, "data science" teams want to collaboratively analyze these datasets, to understand trends and to extract actionable business, scientific, or social insights. Unfortunately, while there exist tools to support data analysis, much-needed underlying infrastructure and data management capabilities are missing. To this end, "DataHub", a collaborative platform for cleaning, storing, understanding, sharing, and publishing datasets, will be developed. DataHub will be a publicly accessible platform that will host private user datasets as well as public datasets retrieved from online sources. DataHub will serve as the common substrate for data science, freeing up end users from tedious dataset book-keeping tasks, and instead supporting them in their search for useful insights. DataHub will be deployed on a large scale at MIT; partnerships with organizations and groups from a variety of sectors will be leveraged upon to show benefits for real data scientists and to ensure that the proposed techniques meet real-world big data challenges. The curriculum development part of this project will lead to the training of new data scientists, and the project will also provide opportunities for graduate and undergraduate students to participate in research and learn how to do collaborative research.Unlike most systems that focus on improving performance or on supporting even more sophisticated analyses, DataHub will instead focus on simplifying and automating many fundamental book-keeping operations that are a pre-requisite to data science. Key features of DataHub will include: (1) a flexible, source code control-like versioning system for data, that efficiently branches, merges, and differences datasets; (2) new data ingest, cleaning, and wrangling tools designed to automate data cleaning process; (3) the ability to search for "related" tables and to integrate them into the analysis process; and (4) the ability to selectively share and collaborate on data sets across users and teams. Overall, DataHub will significantly reduce the amount of effort involved on the part of data scientists for preparing, analyzing, sharing, and managing data.For more information, see the project website at: http://data-hub.org
互联网、智能手机和无线传感器的兴起,带来了关于我们生活方方面面的海量数据,从我们的社交互动到个人偏好,再到我们的生命体征和医疗记录。越来越多的“数据科学”团队希望协作分析这些数据集,以了解趋势并提取可操作的业务、科学或社会见解。不幸的是,虽然存在支持数据分析的工具,但缺少急需的底层基础设施和数据管理功能。为此,将开发“DataHub”,这是一个用于清理、存储、理解、共享和发布数据集的协作平台。DataHub将是一个可公开访问的平台,将托管私人用户数据集以及从在线资源检索的公共数据集。DataHub将成为数据科学的共同基础,将最终用户从繁琐的数据集簿记任务中解放出来,并支持他们寻找有用的见解。DataHub将在麻省理工学院大规模部署;与来自不同部门的组织和团体的合作关系将被利用来为真正的数据科学家展示利益,并确保所提出的技术满足现实世界的大数据挑战。该项目的课程开发部分将培养新的数据科学家,该项目还将为研究生和本科生提供参与研究和学习如何进行合作研究的机会。与大多数专注于提高性能或支持更复杂分析的系统不同,DataHub将专注于简化和自动化许多基本的簿记操作,这些操作是数据科学的先决条件。DataHub的主要特性包括:(1)一个灵活的、类似于源代码控制的数据版本控制系统,可以有效地分支、合并和区分数据集;(2)设计新的数据摄取、清理和整理工具,使数据清理过程自动化;(3)搜索“相关”表并将其整合到分析过程中的能力;(4)有选择地在用户和团队之间共享和协作数据集的能力。总的来说,DataHub将显著减少数据科学家准备、分析、共享和管理数据的工作量。欲了解更多信息,请参阅项目网站:http://data-hub.org

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Samuel Madden其他文献

MEDLINE/ PubMed
MEDLINE/PubMed
  • DOI:
    10.1007/978-0-387-39940-9_3039
  • 发表时间:
    2004
  • 期刊:
  • 影响因子:
    3.8
  • 作者:
    Cornelia Caragea;V. Honavar;P. Boncz;P. Larson;S. Dietrich;Gonzalo Navarro;Bhavani Thuraisingham;Yan Luo;Ouri E. Wolfson;S. Beitzel;Eric C. Jensen;Ophir Frieder;Christian S. Jensen;N. Tradisauskas;Ethan V. Munson;A. Wun;K. Goda;Stephen E. Fienberg;Jiashun Jin;Guimei Liu;Nick Craswell;T. Pedersen;Cesare Pautasso;M. Moro;S. Manegold;B. Carminati;Marina Blanton;Sara Bouchenak;Noël de Palma;Wei Tang;Christoph Quix;M. Jeusfeld;R. K. Pon;David J. Buttler;W. Meng;P. Zezula;Michal Batko;Vlastislav Dohnal;J. Domingo;Denilson Barbosa;Ioana Manolescu;Jeffrey Xu Yu;Emmanuel Cecchet;Vivien Quéma;Xifeng Yan;G. Santucci;D. Zeinalipour;Panos K. Chrysanthis;Amol Deshpande;Carlos Guestrin;Samuel Madden;Carson Kai;R. H. Güting;Amarnath Gupta;Heng Tao Shen;G. Weikum;Ramesh Jain;Jeffrey Xu Yu;Paolo Ciaccia;K. Candan;M. Sapino;C. Meghini;F. Sebastiani;U. Straccia;F. Nack;V. S. Subrahmanian;Maria Vanina Martinez;D. Reforgiato;T. Westerveld;M. Sebillo;G. Vitiello;Maria De Marsico;K. Voruganti;C. Parent;S. Spaccapietra;Christelle Vangenot;Esteban Zimányi;Prasan Roy;S. Sudarshan;E. Puppo;Peer Kröger;Matthias Renz;H. Schuldt;Solmaz Kolahi;A. Unwin;W. Cellary
  • 通讯作者:
    W. Cellary
Cabernet: A Content Delivery Network for Moving Vehicles
Cabernet:移动车辆的内容交付网络
  • DOI:
  • 发表时间:
    2008
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Jakob Eriksson;H. Balakrishnan;Samuel Madden
  • 通讯作者:
    Samuel Madden
Cackle: Analytical Workload Cost and Performance Stability With Elastic Pools
Cackle:使用弹性池分析工作负载成本和性能稳定性
  • DOI:
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Matthew Perron;Raul Castro Fernandez;David DeWitt;Michael Cafarella;Samuel Madden
  • 通讯作者:
    Samuel Madden
Performant almost-latch-free data structures using epoch protection in more depth
更深入地使用纪元保护的高性能几乎无锁存的数据结构
  • DOI:
  • 发表时间:
    2024
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Tianyu Li;Badrish Chandramouli;Samuel Madden
  • 通讯作者:
    Samuel Madden
Research contributions of Mike Stonebraker: an overview
  • DOI:
    10.1145/3226595.3226612
  • 发表时间:
    2018-12
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Samuel Madden
  • 通讯作者:
    Samuel Madden

Samuel Madden的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Samuel Madden', 18)}}的其他基金

Collaborative Research: Elements: A Self-tuning Anomaly Detection Service
合作研究:Elements:自调整异常检测服务
  • 批准号:
    2103799
  • 财政年份:
    2021
  • 资助金额:
    $ 33.33万
  • 项目类别:
    Standard Grant
III: Medium: Massively Parallel Data Analytics on Heterogeneous Architectures
III:中:异构架构上的大规模并行数据分析
  • 批准号:
    1763434
  • 财政年份:
    2018
  • 资助金额:
    $ 33.33万
  • 项目类别:
    Continuing Grant
BD Spokes: SPOKE: NORTHEAST: Collaborative: A Licensing Model and Ecosystem for Data Sharing
BD Spokes:SPOKE:NORTHEAST:协作:数据共享的许可模型和生态系统
  • 批准号:
    1636766
  • 财政年份:
    2016
  • 资助金额:
    $ 33.33万
  • 项目类别:
    Standard Grant
ACM SIGMOD 2012 Student Programming Contest: A Multidimensional Indexing System
ACM SIGMOD 2012 学生编程竞赛:多维索引系统
  • 批准号:
    1235666
  • 财政年份:
    2012
  • 资助金额:
    $ 33.33万
  • 项目类别:
    Standard Grant
III: Medium: Scalable and Secure Database as a Service
III:中等:可扩展且安全的数据库即服务
  • 批准号:
    1065219
  • 财政年份:
    2011
  • 资助金额:
    $ 33.33万
  • 项目类别:
    Continuing Grant
SIGMOD 2011 Programming Contest
SIGMOD 2011 编程大赛
  • 批准号:
    1129526
  • 财政年份:
    2011
  • 资助金额:
    $ 33.33万
  • 项目类别:
    Standard Grant
III: Large: Collaborative Research: SciDB - An Array Oriented Data Management System for Massive Scale Scientific Data
III:大型:协作研究:SciDB - 用于大规模科学数据的面向数组的数据管理系统
  • 批准号:
    1111371
  • 财政年份:
    2011
  • 资助金额:
    $ 33.33万
  • 项目类别:
    Standard Grant
2010 SIGMOD Programming Contest
2010年SIGMOD编程大赛
  • 批准号:
    1037986
  • 财政年份:
    2010
  • 资助金额:
    $ 33.33万
  • 项目类别:
    Standard Grant
Collaborative Research: A Comparative Study of Approaches to Cluster-Based Large Scale Data Analysis
协作研究:基于集群的大规模数据分析方法的比较研究
  • 批准号:
    0844013
  • 财政年份:
    2009
  • 资助金额:
    $ 33.33万
  • 项目类别:
    Standard Grant
2009 SIGMOD Programming Contest
2009年SIGMOD编程大赛
  • 批准号:
    0848727
  • 财政年份:
    2008
  • 资助金额:
    $ 33.33万
  • 项目类别:
    Standard Grant

相似海外基金

III : Medium: Collaborative Research: From Open Data to Open Data Curation
III:媒介:协作研究:从开放数据到开放数据管理
  • 批准号:
    2420691
  • 财政年份:
    2024
  • 资助金额:
    $ 33.33万
  • 项目类别:
    Standard Grant
Collaborative Research: III: Medium: Designing AI Systems with Steerable Long-Term Dynamics
合作研究:III:中:设计具有可操纵长期动态的人工智能系统
  • 批准号:
    2312865
  • 财政年份:
    2023
  • 资助金额:
    $ 33.33万
  • 项目类别:
    Standard Grant
Collaborative Research: III: MEDIUM: Responsible Design and Validation of Algorithmic Rankers
合作研究:III:媒介:算法排序器的负责任设计和验证
  • 批准号:
    2312932
  • 财政年份:
    2023
  • 资助金额:
    $ 33.33万
  • 项目类别:
    Standard Grant
III: Medium: Collaborative Research: Integrating Large-Scale Machine Learning and Edge Computing for Collaborative Autonomous Vehicles
III:媒介:协作研究:集成大规模机器学习和边缘计算以实现协作自动驾驶汽车
  • 批准号:
    2348169
  • 财政年份:
    2023
  • 资助金额:
    $ 33.33万
  • 项目类别:
    Continuing Grant
Collaborative Research: III: Medium: Algorithms for scalable inference and phylodynamic analysis of tumor haplotypes using low-coverage single cell sequencing data
合作研究:III:中:使用低覆盖率单细胞测序数据对肿瘤单倍型进行可扩展推理和系统动力学分析的算法
  • 批准号:
    2415562
  • 财政年份:
    2023
  • 资助金额:
    $ 33.33万
  • 项目类别:
    Standard Grant
Collaborative Research: III: Medium: New Machine Learning Empowered Nanoinformatics System for Advancing Nanomaterial Design
合作研究:III:媒介:新的机器学习赋能纳米信息学系统,促进纳米材料设计
  • 批准号:
    2347592
  • 财政年份:
    2023
  • 资助金额:
    $ 33.33万
  • 项目类别:
    Standard Grant
Collaborative Research: III: Medium: Knowledge discovery from highly heterogeneous, sparse and private data in biomedical informatics
合作研究:III:中:生物医学信息学中高度异构、稀疏和私有数据的知识发现
  • 批准号:
    2312862
  • 财政年份:
    2023
  • 资助金额:
    $ 33.33万
  • 项目类别:
    Standard Grant
Collaborative Research: III: MEDIUM: Responsible Design and Validation of Algorithmic Rankers
合作研究:III:媒介:算法排序器的负责任设计和验证
  • 批准号:
    2312930
  • 财政年份:
    2023
  • 资助金额:
    $ 33.33万
  • 项目类别:
    Standard Grant
Collaborative Research: III: Medium: VirtualLab: Integrating Deep Graph Learning and Causal Inference for Multi-Agent Dynamical Systems
协作研究:III:媒介:VirtualLab:集成多智能体动态系统的深度图学习和因果推理
  • 批准号:
    2312501
  • 财政年份:
    2023
  • 资助金额:
    $ 33.33万
  • 项目类别:
    Standard Grant
Collaborative Research: IIS: III: MEDIUM: Learning Protein-ish: Foundational Insight on Protein Language Models for Better Understanding, Democratized Access, and Discovery
协作研究:IIS:III:中等:学习蛋白质:对蛋白质语言模型的基础洞察,以更好地理解、民主化访问和发现
  • 批准号:
    2310113
  • 财政年份:
    2023
  • 资助金额:
    $ 33.33万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了