DataPrep: Human-in-the-Loop Data Preparation
DataPrep:人在环数据准备
基本信息
- 批准号:RGPIN-2021-03995
- 负责人:
- 金额:$ 3.5万
- 依托单位:
- 依托单位国家:加拿大
- 项目类别:Discovery Grants Program - Individual
- 财政年份:2022
- 资助国家:加拿大
- 起止时间:2022-01-01 至 2023-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Data preparation refers to the process of collecting, exploring, cleaning, transforming, and integrating data into a form for downstream analysis and modeling. It is widely regarded as the most time-consuming part in the data-science lifecycle. Although some efforts have been devoted to solving this problem, a survey released by Anaconda in 2020 shows that it is still the case that "Data preparation and cleansing takes valuable time away from real data science work and has a negative impact on overall job satisfaction". In recent years, Python has become the most popular programming language among data scientists. There are a wide range of Python libraries available to simplify different stages of the data science pipeline. Take Scikit-learn as an example. With its help, data scientists are able to spend much less time on the machine learning stage. Inspired by the great success of Scikit-learn, our proposed research program aims to build DataPrep (http://dataprep.ai), a human-in-the-loop data preparation system in Python. In the short term, we will work on three specific modules in DataPrep: i) DataPrep.EDA: a task-centric exploratory data analysis module; ii) DataPrep.Connector: a unified API wrapper to simplify web data collection; iii) DataPrep.Match: an automated model development module for entity matching. The long term goal is to build an all-in-one data preparation system that provides the easiest way for data scientists to prepare data in Python. Data preparation is a critical stage towards successful data science projects in many scientific fields. The market size of data preparation is estimated to be over 18 billion (USD) by 2027. Since our DataPrep system can greatly simplify data preparation, it will make a big impact on the field of data science from both academic and industrial perspectives.
数据准备是指收集、探索、清理、转换数据并将其集成到表单中以进行下游分析和建模的过程。它被广泛认为是数据科学生命周期中最耗时的部分。尽管已经为解决这一问题付出了一些努力,但Anaconda在2020年发布的一项调查显示,“数据准备和清洗占用了真实的数据科学工作的宝贵时间,并对整体工作满意度产生负面影响”的情况仍然存在。近年来,Python已经成为数据科学家中最流行的编程语言。有大量的Python库可用于简化数据科学管道的不同阶段。以Scikit-learn为例。在它的帮助下,数据科学家能够在机器学习阶段花费更少的时间。受到Scikit-learn巨大成功的启发,我们提议的研究计划旨在构建DataPrep(http:dataprep.ai),这是一个Python中的人在回路数据准备系统。 在短期内,我们将致力于DataPrep中的三个特定模块:i)DataPrep.EDA:以任务为中心的探索性数据分析模块; ii)DataPrep.Connector:简化Web数据收集的统一API包装器; iii)DataPrep.Match:用于实体匹配的自动化模型开发模块。长期目标是构建一个一体化的数据准备系统,为数据科学家提供用Python准备数据的最简单方法。数据准备是许多科学领域成功的数据科学项目的关键阶段。到2027年,数据准备的市场规模估计将超过180亿美元。由于我们的DataPrep系统可以大大简化数据准备,因此它将从学术和工业角度对数据科学领域产生重大影响。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Wang, Jiannan其他文献
太阳能塔式热发电站熔融盐吸热器过热故障的影响因素分析
- DOI:
- 发表时间:
- 期刊:
- 影响因子:0
- 作者:
Wang, Jiannan;Li, Xin;Chang, Chun - 通讯作者:
Chang, Chun
Optimization of Material for Key Components and Parameters of Peanut Sheller Based on Hertz Theory and Box-Behnken Design
- DOI:
10.3390/agriculture12020146 - 发表时间:
2022-02-01 - 期刊:
- 影响因子:3.6
- 作者:
Wang, Jiannan;Xie, Huanxiong;Ma, Chenbin - 通讯作者:
Ma, Chenbin
Motility and function of smooth muscle cells in a silk small-caliber tubular scaffold after replacement of rabbit common carotid artery
- DOI:
10.1016/j.msec.2020.110977 - 发表时间:
2020-09-01 - 期刊:
- 影响因子:7.9
- 作者:
Li, Helei;Song, Guangzhou;Wang, Jiannan - 通讯作者:
Wang, Jiannan
Steady-State Behavior and Endothelialization of a Silk-Based Small-Caliber Scaffold In Vivo Transplantation
丝基小口径支架体内移植的稳态行为和内皮化
- DOI:
10.3390/polym11081303 - 发表时间:
2019-08-01 - 期刊:
- 影响因子:5
- 作者:
Li, Helei;Wang, Yining;Wang, Jiannan - 通讯作者:
Wang, Jiannan
Cytocompatibility of a silk fibroin tubular scaffold
丝素蛋白管状支架的细胞相容性
- DOI:
10.1016/j.msec.2013.09.039 - 发表时间:
2014-01-01 - 期刊:
- 影响因子:7.9
- 作者:
Wang, Jiannan;Wei, Yali;Zhao, Huanrong - 通讯作者:
Zhao, Huanrong
Wang, Jiannan的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Wang, Jiannan', 18)}}的其他基金
DataPrep: Human-in-the-Loop Data Preparation
DataPrep:人在环数据准备
- 批准号:
RGPIN-2021-03995 - 财政年份:2021
- 资助金额:
$ 3.5万 - 项目类别:
Discovery Grants Program - Individual
Crowdsourced Data Cleaning
众包数据清理
- 批准号:
RGPIN-2016-05555 - 财政年份:2020
- 资助金额:
$ 3.5万 - 项目类别:
Discovery Grants Program - Individual
Entity augmentation and data cleaning for machine learning
用于机器学习的实体增强和数据清理
- 批准号:
508081-2016 - 财政年份:2019
- 资助金额:
$ 3.5万 - 项目类别:
Collaborative Research and Development Grants
Crowdsourced Data Cleaning
众包数据清理
- 批准号:
RGPIN-2016-05555 - 财政年份:2019
- 资助金额:
$ 3.5万 - 项目类别:
Discovery Grants Program - Individual
Entity augmentation and data cleaning for machine learning
用于机器学习的实体增强和数据清理
- 批准号:
508081-2016 - 财政年份:2018
- 资助金额:
$ 3.5万 - 项目类别:
Collaborative Research and Development Grants
Crowdsourced Data Cleaning
众包数据清理
- 批准号:
RGPIN-2016-05555 - 财政年份:2018
- 资助金额:
$ 3.5万 - 项目类别:
Discovery Grants Program - Individual
Approximate Query Processing over Secure Key/Value Stores
通过安全键/值存储进行近似查询处理
- 批准号:
517430-2017 - 财政年份:2017
- 资助金额:
$ 3.5万 - 项目类别:
Engage Grants Program
Crowdsourced Data Cleaning
众包数据清理
- 批准号:
RGPIN-2016-05555 - 财政年份:2017
- 资助金额:
$ 3.5万 - 项目类别:
Discovery Grants Program - Individual
Entity augmentation and data cleaning for machine learning
用于机器学习的实体增强和数据清理
- 批准号:
508081-2016 - 财政年份:2017
- 资助金额:
$ 3.5万 - 项目类别:
Collaborative Research and Development Grants
A unified access server for SQL-on-Hadoop systems
SQL-on-Hadoop系统的统一访问服务器
- 批准号:
501015-2016 - 财政年份:2016
- 资助金额:
$ 3.5万 - 项目类别:
Engage Grants Program
相似国自然基金
靶向Human ZAG蛋白的降糖小分子化合物筛选以及疗效观察
- 批准号:
- 批准年份:2025
- 资助金额:0.0 万元
- 项目类别:省市级项目
HBV S-Human ESPL1融合基因在慢性乙型肝炎发病进程中的分子机制研究
- 批准号:81960115
- 批准年份:2019
- 资助金额:34.0 万元
- 项目类别:地区科学基金项目
基于自适应表面肌电模型的下肢康复机器人“Human-in-Loop”控制研究
- 批准号:61005070
- 批准年份:2010
- 资助金额:20.0 万元
- 项目类别:青年科学基金项目
相似海外基金
CAREER: Psychology-aware Human-in-the-Loop Cyber-Physical-System (HCPS): Methodologies, Algorithms, and Deployment
职业:具有心理学意识的人在环网络物理系统 (HCPS):方法、算法和部署
- 批准号:
2339266 - 财政年份:2024
- 资助金额:
$ 3.5万 - 项目类别:
Continuing Grant
Early Detection of Pancreatic Cancer with Human-in-the-Loop Deep Learning
通过人在环深度学习早期检测胰腺癌
- 批准号:
10592060 - 财政年份:2023
- 资助金额:
$ 3.5万 - 项目类别:
SBIR Phase I: Brave Virtual Worlds Human Movement Artificial Intelligence (AI) Engine and Biofeedback Loop
SBIR 第一阶段:勇敢的虚拟世界人体运动人工智能 (AI) 引擎和生物反馈循环
- 批准号:
2326586 - 财政年份:2023
- 资助金额:
$ 3.5万 - 项目类别:
Standard Grant
CAREER: Closing the Loop of Human-Machine Interactions via Skin-Like Multimodal Haptic Interfaces
职业:通过类肤多模态触觉界面闭合人机交互循环
- 批准号:
2238363 - 财政年份:2023
- 资助金额:
$ 3.5万 - 项目类别:
Continuing Grant
Feasibility of delivering and demonstrating a human-in-the-loop digital twin in the construction and maintenance of GCRE (Athena)
在 GCRE (Athena) 的建设和维护中交付和演示人机交互数字孪生的可行性
- 批准号:
10063263 - 财政年份:2023
- 资助金额:
$ 3.5万 - 项目类别:
Collaborative R&D
Artificial Intelligence with Human In The Loop for Automated Medical Image Contouring in Precision Oncology
人工智能与人在环,用于精准肿瘤学中的自动化医学图像轮廓
- 批准号:
2887158 - 财政年份:2023
- 资助金额:
$ 3.5万 - 项目类别:
Studentship
Mapping ankle-foot stiffness to socket comfort and pressure using a robotic emulator platform to personalize prosthesis function via human-in-the-loop optimization
使用机器人仿真器平台将踝足硬度映射到插座舒适度和压力,通过人机交互优化来个性化假肢功能
- 批准号:
10584383 - 财政年份:2023
- 资助金额:
$ 3.5万 - 项目类别:
A closed-loop human–agent learning framework to enhance decision making
用于增强决策的闭环人类代理学习框架
- 批准号:
DE220100265 - 财政年份:2022
- 资助金额:
$ 3.5万 - 项目类别:
Discovery Early Career Researcher Award
CPS: Small: Human-in-the-Loop Learning of Complex Events in Uncontrolled Environments
CPS:小型:不受控环境中复杂事件的人机循环学习
- 批准号:
2227002 - 财政年份:2022
- 资助金额:
$ 3.5万 - 项目类别:
Standard Grant
EAGER: DCL: SaTC: Enabling Interdisciplinary Collaboration: Efficient Human-in-the-Loop Redaction of Language Development Corpora
EAGER:DCL:SaTC:实现跨学科协作:语言开发语料库的高效人机交互编辑
- 批准号:
2210193 - 财政年份:2022
- 资助金额:
$ 3.5万 - 项目类别:
Standard Grant