III: Medium: Enabling Technologies for 21st Century Entity Matching Applications
III:媒介:21 世纪实体匹配应用的支持技术
基本信息
- 批准号:1564282
- 负责人:
- 金额:$ 108.59万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2016
- 资助国家:美国
- 起止时间:2016-09-01 至 2021-08-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Entity matching (EM) decides if different data instances (such as "UW-Madison" and "Univ of Wisc Madison") refer to the same real-world entity. This problem is critical for numerous Big Data and data science applications. This project will develop solutions and tools for EM which will significantly advance the state of the art. Compared to the current solutions, the proposed solutions will consider the entire EM pipeline, will be usable by lay users (such as domain scientists and journalists), will scale to large amounts of data, and will exploit crowdsourcing to maximize matching accuracy. Technologies developed in this project will be evaluated in three domains: managing scientific data for limnology (the study of lakes and other bodies of freshwater), product matching for e-commerce at WalmartLabs, and developing the Internet of Buildings at Johnson Control Inc. As such, the project will facilitate the widespread deployment of EM tools, thus resulting in more effective information management and access for society. Through its release of open-source software it will help educate next-generation workers and researchers. The research will help domain scientists in limnology, and can potentially impact hundreds of thousands of buildings and millions of users via collaboration with Johnson Controls and WalmartLabs. Finally, a planned textbook and open-source system artifacts from the project will be disseminated broadly in the research community, to significantly enhance the data management infrastructure for research and education.The project will introduce both conceptual and technical novelties. Conceptually, instead of focusing on just the matching step (and studying how to match accurately and scalably, as much of current work has done), the project advocates developing solutions for the entire raw-data-to-matches EM pipeline. Further, it advocates using the matching step to guide the execution of the remaining steps in the EM pipeline. Technically, the project will develop novel solutions for non-matching steps in the EM pipeline, and do so in a matching-driven fashion. It also introduces new important problems, such as EM debugging. Finally, it develops novel solutions to scale up the entire EM pipeline and to exploit crowdsourcing. As described, the project takes the next logical step in EM research, and can significantly advance the state of the art. Further, many problems underlying this research have commonalities with other data management scenarios. Hence, the research has the potential to contribute to those areas as well. For more information, see the project's homepage at https://sites.google.com/site/anhaidgroup/projects/nsf-em-project-2016.
实体匹配(EM)决定不同的数据实例(如“UW-Madison”和“Univ of Wisc Madison”)是否引用相同的真实实体。这个问题对于众多大数据和数据科学应用至关重要。该项目将为EM开发解决方案和工具,这将极大地促进最先进的技术。与目前的解决方案相比,建议的解决方案将考虑整个EM管道,将供非专业用户(如领域科学家和记者)使用,将扩展到大量数据,并将利用众包来最大限度地提高匹配精度。该项目开发的技术将在三个领域进行评估:管理湖泊学(湖泊和其他淡水水体的研究)的科学数据,沃尔玛实验室的电子商务产品匹配,以及Johnson Control Inc.开发建筑物互联网。因此,该项目将促进EM工具的广泛部署,从而为社会带来更有效的信息管理和访问。通过发布开源软件,它将帮助教育下一代工人和研究人员。这项研究将帮助湖泊学领域的科学家,并可能通过与江森自控和沃尔玛实验室的合作,影响数十万座建筑和数百万用户。最后,该项目计划中的教科书和开放源码系统产品将在研究界广泛传播,以显著加强研究和教育的数据管理基础设施。该项目将引入概念和技术创新。从概念上讲,该项目不是只关注匹配步骤(并研究如何像目前的许多工作那样,准确和可伸缩地进行匹配),而是主张为整个原始数据到匹配EM管道开发解决方案。此外,它还主张使用匹配步骤来指导EM管道中剩余步骤的执行。从技术上讲,该项目将为新兴市场管道中的非匹配步骤开发新的解决方案,并以匹配驱动的方式做到这一点。它还引入了新的重要问题,如EM调试。最后,它开发了新的解决方案,以扩大整个新兴市场渠道的规模,并利用众包。如上所述,该项目采取了EM研究的下一个合乎逻辑的步骤,并可以显著推动最先进的技术。此外,这项研究背后的许多问题与其他数据管理场景具有共性。因此,这项研究也有可能对这些领域做出贡献。有关更多信息,请参阅该项目的主页https://sites.google.com/site/anhaidgroup/projects/nsf-em-project-2016.
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
AnHai Doan其他文献
AnHai Doan的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('AnHai Doan', 18)}}的其他基金
EAGER: Discovering Emerging Events in Social Media
EAGER:发现社交媒体中的新兴事件
- 批准号:
1143807 - 财政年份:2011
- 资助金额:
$ 108.59万 - 项目类别:
Continuing Grant
III:Small:Enabling Technology for Best-Effort Data Integration Systems
III:小型:尽力而为数据集成系统的支持技术
- 批准号:
1018792 - 财政年份:2010
- 资助金额:
$ 108.59万 - 项目类别:
Continuing Grant
CAREER: Evolving and Self-Managing Data Integration Systems
职业:不断发展和自我管理的数据集成系统
- 批准号:
0712836 - 财政年份:2006
- 资助金额:
$ 108.59万 - 项目类别:
Continuing Grant
CAREER: Evolving and Self-Managing Data Integration Systems
职业:不断发展和自我管理的数据集成系统
- 批准号:
0347903 - 财政年份:2004
- 资助金额:
$ 108.59万 - 项目类别:
Continuing Grant
相似海外基金
Collaborative Research: SHF: Medium: Enabling Graphics Processing Unit Performance Simulation for Large-Scale Workloads with Lightweight Simulation Methods
合作研究:SHF:中:通过轻量级仿真方法实现大规模工作负载的图形处理单元性能仿真
- 批准号:
2402804 - 财政年份:2024
- 资助金额:
$ 108.59万 - 项目类别:
Standard Grant
Collaborative Research: SHF: Medium: Enabling GPU Performance Simulation for Large-Scale Workloads with Lightweight Simulation Methods
合作研究:SHF:中:通过轻量级仿真方法实现大规模工作负载的 GPU 性能仿真
- 批准号:
2402806 - 财政年份:2024
- 资助金额:
$ 108.59万 - 项目类别:
Standard Grant
CPS: Medium: GOALI: Enabling Safe Innovation for Autonomy: Making Publish/Subscribe Really Real-Time
CPS:中:GOALI:实现自主安全创新:使发布/订阅真正实时
- 批准号:
2333120 - 财政年份:2024
- 资助金额:
$ 108.59万 - 项目类别:
Standard Grant
Collaborative Research: SHF: Medium: Enabling GPU Performance Simulation for Large-Scale Workloads with Lightweight Simulation Methods
合作研究:SHF:中:通过轻量级仿真方法实现大规模工作负载的 GPU 性能仿真
- 批准号:
2402805 - 财政年份:2024
- 资助金额:
$ 108.59万 - 项目类别:
Standard Grant
Collaborative Research: CPS: Medium: Enabling Data-Driven Security and Safety Analyses for Cyber-Physical Systems
协作研究:CPS:中:为网络物理系统实现数据驱动的安全和安全分析
- 批准号:
2414176 - 财政年份:2023
- 资助金额:
$ 108.59万 - 项目类别:
Standard Grant
Enabling affordable and sustainable cultivated meat with a first-in-class growth medium.
通过一流的生长培养基实现经济实惠且可持续的培育肉类。
- 批准号:
10061737 - 财政年份:2023
- 资助金额:
$ 108.59万 - 项目类别:
EU-Funded
Collaborative Research: CPS: Medium: Enabling Data-Driven Security and Safety Analyses for Cyber-Physical Systems
协作研究:CPS:中:为网络物理系统实现数据驱动的安全和安全分析
- 批准号:
2132285 - 财政年份:2022
- 资助金额:
$ 108.59万 - 项目类别:
Standard Grant
Collaborative Research: CPS: Medium: Enabling Autonomous, Persistent, and Adaptive Mobile Observational Networks Through Energy-Aware Dynamic Coverage
合作研究:CPS:中:通过能量感知动态覆盖实现自主、持久和自适应移动观测网络
- 批准号:
2223844 - 财政年份:2022
- 资助金额:
$ 108.59万 - 项目类别:
Standard Grant
Collaborative Research: CPS: Medium: Enabling Autonomous, Persistent, and Adaptive Mobile Observational Networks Through Energy-Aware Dynamic Coverage
合作研究:CPS:中:通过能量感知动态覆盖实现自主、持久和自适应移动观测网络
- 批准号:
2223845 - 财政年份:2022
- 资助金额:
$ 108.59万 - 项目类别:
Standard Grant
Collaborative Research: SaTC: CORE: Medium: Enabling Practically Secure Cellular Infrastructure
协作研究:SaTC:核心:中:实现切实安全的蜂窝基础设施
- 批准号:
2055014 - 财政年份:2022
- 资助金额:
$ 108.59万 - 项目类别:
Standard Grant