权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

EAGER: End-to-End Learning of Paradoxes and Interpretations for Data Storytelling

EAGER：悖论的端到端学习和数据讲故事的解释

基本信息

批准号：
2331065
负责人：
Jian Pei
金额：
$ 12.5万
依托单位：
Duke University
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2023
资助国家：
美国
起止时间：
2023-10-01 至 2024-09-30
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2331065&HistoricalAwards=false
关键词：
EAGER End Learning Paradoxes Interpretations

项目摘要

Can we use AI and machine learning to automatically create compelling narratives from extensive data sets, like business data, demographics, and economic statistics? This project aims to answer this question in the affirmative and make data storytelling accessible to everyone, even those without advanced big data skills. Data storytelling is a powerful technique for understanding and extracting valuable insights from large data sets. However, it can be daunting for individuals who lack the technical expertise to navigate and interpret big data effectively. In this project, we propose an innovative approach to automatic data storytelling by learning from paradoxes found within a large data set. Paradoxes are unexpected contradictions that exist within data, and they hold the potential to uncover critical and surprising information. By harnessing these paradoxes, we can craft engaging narratives that highlight the most interesting and important aspects of the data, especially for decision-makers and data consumers. The ultimate goal of this project is to develop open-source software that facilitates creating compelling data stories. The outcomes and findings will be made publicly available, ensuring that individuals and organizations can benefit from this project. Additionally, the insights gained from this research will be incorporated into data science courses, empowering future data storytellers with the necessary skills to communicate complex information effectively. By democratizing data storytelling through the use of paradoxes and making it accessible to a wider audience, we can unlock the hidden potential of extensive datasets and strengthen data-driven decision-making in various domains.This project will address the core challenges of paradox identification through two main thrusts. First, we will focus on investigating concise and non-redundant representations of statistical relationships among variables in data. Our goal is to formulate representations that are both concise and minimal, ensuring efficiency in conveying information. To demonstrate the feasibility and potential of our approach, we will specifically examine Simpson's paradox in this pilot project. Second, building on the established model, we will develop efficient and scalable algorithms to find data paradoxes and their interpretations. Our focus will be on exploring the efficiency, completeness, and non-redundancy of the learning process, using real-world datasets for evaluation. The knowledge and insights gained from this research will be integrated into data science education and training programs, enhancing the skills and capabilities of future data scientists. This project's transformative nature lies in its recognition of data storytelling as a fundamental approach within the realm of data science. The algorithms and tools we develop will substantially enhance the abilities of data scientists, statisticians, and business intelligence analysts to explore data, uncover new knowledge, and deliver valuable insights. As a result, these advancements will have wide-ranging applications and be of significant value to diverse communities. By addressing the challenges of paradox learning and advancing the field of data storytelling, this project will facilitate the exploration and interpretation of complex data, ultimately enabling more informed decision-making to a wider set of people and driving innovation across various sectors.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

我们能否使用人工智能和机器学习来自动从大量数据集（如商业数据、人口统计数据和经济统计数据）中创建引人注目的叙述？该项目旨在肯定地回答这个问题，并使每个人都可以使用数据讲故事，即使是那些没有高级大数据技能的人。数据故事讲述是一种强大的技术，用于从大型数据集中理解和提取有价值的见解。然而，对于缺乏有效导航和解释大数据的技术专业知识的人来说，这可能是令人生畏的。在这个项目中，我们提出了一种创新的方法，通过从大型数据集中发现的悖论中学习来自动讲述数据故事。Parasites是数据中存在的意想不到的矛盾，它们有可能揭示关键和令人惊讶的信息。通过利用这些悖论，我们可以制作引人入胜的叙述，突出数据中最有趣和最重要的方面，特别是对于决策者和数据消费者。该项目的最终目标是开发开源软件，以促进创建引人注目的数据故事。结果和调查结果将公布于众，以确保个人和组织能够从该项目中受益。此外，从这项研究中获得的见解将被纳入数据科学课程，使未来的数据故事讲述者能够有效地传达复杂信息。通过使用悖论使数据故事民主化，并使其能够被更广泛的受众所访问，我们可以释放广泛数据集的隐藏潜力，并加强各个领域的数据驱动决策。该项目将通过两个主要目标来解决悖论识别的核心挑战。首先，我们将重点研究数据中变量之间统计关系的简洁和非冗余表示。我们的目标是制定既简洁又最小的表示，确保传达信息的效率。为了证明我们的方法的可行性和潜力，我们将在这个试点项目中专门研究辛普森悖论。其次，在建立的模型的基础上，我们将开发高效和可扩展的算法来发现数据悖论及其解释。我们的重点将是探索学习过程的效率，完整性和非冗余性，使用真实世界的数据集进行评估。从这项研究中获得的知识和见解将融入数据科学教育和培训计划，提高未来数据科学家的技能和能力。该项目的变革性在于它将数据故事讲述视为数据科学领域的基本方法。我们开发的算法和工具将大大提高数据科学家、统计学家和商业智能分析师探索数据、发现新知识和提供有价值见解的能力。因此，这些进步将有广泛的应用，并对不同的社区具有重要价值。通过解决悖论学习的挑战和推进数据故事领域，该项目将促进复杂数据的探索和解释，最终实现更明智的决策-该奖项反映了NSF的法定使命，并通过使用基金会的知识价值和更广泛的影响审查进行评估，被认为值得支持的搜索.

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

数据更新时间：{{ journalArticles.updateTime }}

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Jian Pei其他文献

Odd Even Effect of Thiophene Chain Lengths on Excited State Properties in Oligo(thienyl ethynylene)-Cored Chromophores

噻吩链长对寡（噻吩基乙炔基）核发色团激发态性质的奇偶效应

DOI：
10.1021/acs.jpcc.7b00203
发表时间：
2017
期刊：
Journal of Physical Chemistry C
影响因子：
3.7
作者：
Xian Wang;Guiying He;Yang Li;Zhuoran Kuang;Qianjin Guo;Jin-Liang Wang;Jian Pei;Andong Xia
通讯作者：
Andong Xia