III: Medium: Collaborative Research: DataHub - A Collaborative Dataset Management Platform for Data Science

III:媒介:协作研究:DataHub - 数据科学协作数据集管理平台

基本信息

  • 批准号:
    1513972
  • 负责人:
  • 金额:
    $ 33.33万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Continuing Grant
  • 财政年份:
    2015
  • 资助国家:
    美国
  • 起止时间:
    2015-09-01 至 2019-08-31
  • 项目状态:
    已结题

项目摘要

The rise of the Internet, smart phones, and wireless sensors has resulted in a vast trove of data about all aspects of our lives, from our social interactions to our personal preferences to our vital signs and medical records. Increasingly, "data science" teams want to collaboratively analyze these datasets, to understand trends and to extract actionable business, scientific, or social insights. Unfortunately, while there exist tools to support data analysis, much-needed underlying infrastructure and data management capabilities are missing. To this end, "DataHub", a collaborative platform for cleaning, storing, understanding, sharing, and publishing datasets, will be developed. DataHub will be a publicly accessible platform that will host private user datasets as well as public datasets retrieved from online sources. DataHub will serve as the common substrate for data science, freeing up end users from tedious dataset book-keeping tasks, and instead supporting them in their search for useful insights. DataHub will be deployed on a large scale at MIT; partnerships with organizations and groups from a variety of sectors will be leveraged upon to show benefits for real data scientists and to ensure that the proposed techniques meet real-world big data challenges. The curriculum development part of this project will lead to the training of new data scientists, and the project will also provide opportunities for graduate and undergraduate students to participate in research and learn how to do collaborative research.Unlike most systems that focus on improving performance or on supporting even more sophisticated analyses, DataHub will instead focus on simplifying and automating many fundamental book-keeping operations that are a pre-requisite to data science. Key features of DataHub will include: (1) a flexible, source code control-like versioning system for data, that efficiently branches, merges, and differences datasets; (2) new data ingest, cleaning, and wrangling tools designed to automate data cleaning process; (3) the ability to search for "related" tables and to integrate them into the analysis process; and (4) the ability to selectively share and collaborate on data sets across users and teams. Overall, DataHub will significantly reduce the amount of effort involved on the part of data scientists for preparing, analyzing, sharing, and managing data.For more information, see the project website at: http://data-hub.org
互联网、智能手机和无线传感器的兴起带来了大量关于我们生活各个方面的数据,从我们的社交互动到我们的个人偏好,再到我们的生命体征和医疗记录。越来越多的“数据科学”团队希望协作分析这些数据集,以了解趋势并提取可操作的业务,科学或社会见解。不幸的是,虽然存在支持数据分析的工具,但缺少急需的基础设施和数据管理能力。为此,将开发一个用于清理、存储、理解、共享和发布数据集的协作平台“DataHub”。 DataHub将是一个可公开访问的平台,将托管私人用户数据集以及从在线来源检索的公共数据集。 DataHub将作为数据科学的公共基础,将最终用户从繁琐的数据集簿记任务中解放出来,并支持他们寻找有用的见解。DataHub将在麻省理工学院大规模部署;将利用与各个领域的组织和团体的合作伙伴关系,为真实的数据科学家带来好处,并确保提出的技术满足现实世界的大数据挑战。 该项目的课程开发部分将培训新的数据科学家,该项目还将为研究生和本科生提供参与研究并学习如何进行协作研究的机会。与大多数专注于提高性能或支持更复杂分析的系统不同,相反,DataHub将专注于简化和自动化许多基本的簿记操作,这些操作是数据科学的先决条件。DataHub的主要功能包括:(1)一个灵活的、类似于源代码控制的数据版本控制系统,可以有效地对数据集进行分支、合并和区分;(2)新的数据摄取、清理和整理工具,旨在使数据清理过程自动化;(3)搜索“相关”表并将其集成到分析过程中的能力;以及(4)跨用户和团队有选择地共享和协作数据集的能力。总体而言,DataHub将大大减少数据科学家准备、分析、共享和管理数据的工作量。有关更多信息,请访问项目网站:http://data-hub.org

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Amol Deshpande其他文献

MEDLINE/ PubMed
MEDLINE/PubMed
  • DOI:
    10.1007/978-0-387-39940-9_3039
  • 发表时间:
    2004
  • 期刊:
  • 影响因子:
    3.8
  • 作者:
    Cornelia Caragea;V. Honavar;P. Boncz;P. Larson;S. Dietrich;Gonzalo Navarro;Bhavani Thuraisingham;Yan Luo;Ouri E. Wolfson;S. Beitzel;Eric C. Jensen;Ophir Frieder;Christian S. Jensen;N. Tradisauskas;Ethan V. Munson;A. Wun;K. Goda;Stephen E. Fienberg;Jiashun Jin;Guimei Liu;Nick Craswell;T. Pedersen;Cesare Pautasso;M. Moro;S. Manegold;B. Carminati;Marina Blanton;Sara Bouchenak;Noël de Palma;Wei Tang;Christoph Quix;M. Jeusfeld;R. K. Pon;David J. Buttler;W. Meng;P. Zezula;Michal Batko;Vlastislav Dohnal;J. Domingo;Denilson Barbosa;Ioana Manolescu;Jeffrey Xu Yu;Emmanuel Cecchet;Vivien Quéma;Xifeng Yan;G. Santucci;D. Zeinalipour;Panos K. Chrysanthis;Amol Deshpande;Carlos Guestrin;Samuel Madden;Carson Kai;R. H. Güting;Amarnath Gupta;Heng Tao Shen;G. Weikum;Ramesh Jain;Jeffrey Xu Yu;Paolo Ciaccia;K. Candan;M. Sapino;C. Meghini;F. Sebastiani;U. Straccia;F. Nack;V. S. Subrahmanian;Maria Vanina Martinez;D. Reforgiato;T. Westerveld;M. Sebillo;G. Vitiello;Maria De Marsico;K. Voruganti;C. Parent;S. Spaccapietra;Christelle Vangenot;Esteban Zimányi;Prasan Roy;S. Sudarshan;E. Puppo;Peer Kröger;Matthias Renz;H. Schuldt;Solmaz Kolahi;A. Unwin;W. Cellary
  • 通讯作者:
    W. Cellary
To Store or Not to Store: a graph theoretical approach for Dataset Versioning
存储还是不存储:数据集版本控制的图论方法
  • DOI:
  • 发表时间:
    2024
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Anxin Guo;Jingwei Li;Pattara Sukprasert;Samir Khuller;Amol Deshpande;Koyel Mukherjee
  • 通讯作者:
    Koyel Mukherjee
Moment
片刻
  • DOI:
  • 发表时间:
    2009
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Cornelia Caragea;V. Honavar;P. Boncz;Per;Suzanne W. Dietrich;Gonzalo Navarro;B. Thuraisingham;Yan Luo;Ouri Wolfson;S. Beitzel;Eric C. Jensen;O. Frieder;C. S. Jensen;N. Tradisauskas;E. Munson;A. Wun;K. Goda;Stephen E. Fienberg;Jiashun Jin;Guimei Liu;Nick Craswell;T. Pedersen;Cesare Pautasso;M. Moro;S. Manegold;B. Carminati;Marina Blanton;S. Bouchenak;Noël de Palma;Wei Tang;C. Quix;M. Jeusfeld;R. K. Pon;David J. Buttler;Weiyi Meng;P. Zezula;Michal Batko;Vlastislav Dohnal;J. Domingo;Denilson Barbosa;I. Manolescu;Jeffrey Xu Yu;E. Cecchet;Vivien Quéma;Xifeng Yan;G. Santucci;D. Zeinalipour;P. Chrysanthis;Amol Deshpande;Carlos Guestrin;S. Madden;C. Leung;R. H. Güting;Amarnath Gupta;Heng Tao Shen;G. Weikum;Ramesh Jain;Jeffrey Xu Yu;P. Ciaccia;K. Candan;M. Sapino;C. Meghini;Fabrizio Sebastiani;U. Straccia;F. Nack;V. S. Subrahmanian;Maria Vanina Martinez;D. Reforgiato;T. Westerveld;M. Sebillo;G. Vitiello;Maria De Marsico;K. Voruganti;Christine Parent;S. Spaccapietra;C. Vangenot;E. Zimányi;Prasan Roy;S. Sudarshan;Enrico Puppo;Peer Kröger;M. Renz;H. Schuldt;Solmaz Kolahi;A. Unwin;W. Cellary
  • 通讯作者:
    W. Cellary
Application of Packed Bed Chemical Looping (Unmixed) Combustion for water heating: Modelling and CFD simulation for Reduction cycle
  • DOI:
    10.1016/j.cep.2023.109569
  • 发表时间:
    2023-12-01
  • 期刊:
  • 影响因子:
  • 作者:
    Amina Faizal;Amol Deshpande
  • 通讯作者:
    Amol Deshpande
108 – The Prevalence and Use of Cannabis by Patients with Inflammatory Bowel Disease
  • DOI:
    10.1016/s0016-5085(19)36842-8
  • 发表时间:
    2019-05-01
  • 期刊:
  • 影响因子:
  • 作者:
    Lillian Du;Amol Deshpande;Laura Yang;Shlomit Boguslavsky;Kenneth Croitoru;Zane Gallinger;Vivian Huang;Mark S. Silverberg;Adam V. Weizman;Geoffrey C. Nguyen;A. Hillary Steinhart
  • 通讯作者:
    A. Hillary Steinhart

Amol Deshpande的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Amol Deshpande', 18)}}的其他基金

EAGER: Lifecycle Management of Collaborative Analysis Workflows through Provenance Capture and Analysis
EAGER:通过来源捕获和分析进行协作分析工作流程的生命周期管理
  • 批准号:
    1650755
  • 财政年份:
    2016
  • 资助金额:
    $ 33.33万
  • 项目类别:
    Standard Grant
III: Small: Enabling Declarative Querying and Analytics over Large Dynamic Information Networks
III:小型:在大型动态信息网络上实现声明式查询和分析
  • 批准号:
    1319432
  • 财政年份:
    2013
  • 资助金额:
    $ 33.33万
  • 项目类别:
    Continuing Grant
III: Small: Collaborative Proposal: Towards Robust Uncertain Data Management
III:小:协作提案:迈向稳健的不确定数据管理
  • 批准号:
    1218367
  • 财政年份:
    2012
  • 资助金额:
    $ 33.33万
  • 项目类别:
    Continuing Grant
III: Small: Managing Large-scale Uncertain Data Repositories
III:小型:管理大规模不确定数据存储库
  • 批准号:
    0916736
  • 财政年份:
    2009
  • 资助金额:
    $ 33.33万
  • 项目类别:
    Continuing Grant
CAREER: MauveDB: Model-Based User Views over Sensor Data
职业:MauveDB:基于模型的用户对传感器数据的视图
  • 批准号:
    0546136
  • 财政年份:
    2006
  • 资助金额:
    $ 33.33万
  • 项目类别:
    Continuing Grant
CSR-EHS: Collaborative Research: A General, Efficient and Robust Platform for Enabling Control Applications in Sensor Networks
CSR-EHS:协作研究:用于在传感器网络中实现控制应用的通用、高效且稳健的平台
  • 批准号:
    0509220
  • 财政年份:
    2005
  • 资助金额:
    $ 33.33万
  • 项目类别:
    Standard Grant

相似海外基金

III : Medium: Collaborative Research: From Open Data to Open Data Curation
III:媒介:协作研究:从开放数据到开放数据管理
  • 批准号:
    2420691
  • 财政年份:
    2024
  • 资助金额:
    $ 33.33万
  • 项目类别:
    Standard Grant
Collaborative Research: III: Medium: Designing AI Systems with Steerable Long-Term Dynamics
合作研究:III:中:设计具有可操纵长期动态的人工智能系统
  • 批准号:
    2312865
  • 财政年份:
    2023
  • 资助金额:
    $ 33.33万
  • 项目类别:
    Standard Grant
Collaborative Research: III: MEDIUM: Responsible Design and Validation of Algorithmic Rankers
合作研究:III:媒介:算法排序器的负责任设计和验证
  • 批准号:
    2312932
  • 财政年份:
    2023
  • 资助金额:
    $ 33.33万
  • 项目类别:
    Standard Grant
III: Medium: Collaborative Research: Integrating Large-Scale Machine Learning and Edge Computing for Collaborative Autonomous Vehicles
III:媒介:协作研究:集成大规模机器学习和边缘计算以实现协作自动驾驶汽车
  • 批准号:
    2348169
  • 财政年份:
    2023
  • 资助金额:
    $ 33.33万
  • 项目类别:
    Continuing Grant
Collaborative Research: III: Medium: Algorithms for scalable inference and phylodynamic analysis of tumor haplotypes using low-coverage single cell sequencing data
合作研究:III:中:使用低覆盖率单细胞测序数据对肿瘤单倍型进行可扩展推理和系统动力学分析的算法
  • 批准号:
    2415562
  • 财政年份:
    2023
  • 资助金额:
    $ 33.33万
  • 项目类别:
    Standard Grant
Collaborative Research: III: Medium: New Machine Learning Empowered Nanoinformatics System for Advancing Nanomaterial Design
合作研究:III:媒介:新的机器学习赋能纳米信息学系统,促进纳米材料设计
  • 批准号:
    2347592
  • 财政年份:
    2023
  • 资助金额:
    $ 33.33万
  • 项目类别:
    Standard Grant
Collaborative Research: III: Medium: Knowledge discovery from highly heterogeneous, sparse and private data in biomedical informatics
合作研究:III:中:生物医学信息学中高度异构、稀疏和私有数据的知识发现
  • 批准号:
    2312862
  • 财政年份:
    2023
  • 资助金额:
    $ 33.33万
  • 项目类别:
    Standard Grant
Collaborative Research: III: MEDIUM: Responsible Design and Validation of Algorithmic Rankers
合作研究:III:媒介:算法排序器的负责任设计和验证
  • 批准号:
    2312930
  • 财政年份:
    2023
  • 资助金额:
    $ 33.33万
  • 项目类别:
    Standard Grant
Collaborative Research: III: Medium: VirtualLab: Integrating Deep Graph Learning and Causal Inference for Multi-Agent Dynamical Systems
协作研究:III:媒介:VirtualLab:集成多智能体动态系统的深度图学习和因果推理
  • 批准号:
    2312501
  • 财政年份:
    2023
  • 资助金额:
    $ 33.33万
  • 项目类别:
    Standard Grant
Collaborative Research: IIS: III: MEDIUM: Learning Protein-ish: Foundational Insight on Protein Language Models for Better Understanding, Democratized Access, and Discovery
协作研究:IIS:III:中等:学习蛋白质:对蛋白质语言模型的基础洞察,以更好地理解、民主化访问和发现
  • 批准号:
    2310113
  • 财政年份:
    2023
  • 资助金额:
    $ 33.33万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了