Foundations of Data Science Institute

数据科学研究所基础

基本信息

  • 批准号:
    2022448
  • 负责人:
  • 金额:
    $ 549.03万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Continuing Grant
  • 财政年份:
    2020
  • 资助国家:
    美国
  • 起止时间:
    2020-09-01 至 2025-08-31
  • 项目状态:
    未结题

项目摘要

The Foundations of Data Science Institute (FODSI) brings together a large and diverse team of researchers and educators from UC Berkeley, MIT, Boston University, Bryn Mawr College, Harvard University, Howard University, and Northeastern University, with the aim of advancing the theoretical foundations for the field of data science. Data science has emerged as a central science for the 21st century, a widespread approach to science and technology that exploits the explosion in the availability of data to allow empirical investigations at unprecedented scale and scope. It now plays a central role in diverse domains across all of science, commerce and industry. The development of theoretical foundations for principled approaches to data science is particularly challenging because it requires progress across the full breadth of scientific issues that arise in the rich and complex processes by which data can be used to make decisions. These issues include the specification of the goals of data analysis, the development of models that aim to capture the way in which data may have arisen, the crafting of algorithms that are responsive to the models and goals, an understanding of the impact of misspecifications of these models and goals, an understanding of the effects of interactions, interventions and feedback mechanisms that affect the data and the interpretation of the results, concern about the uncertainty of these results, an understanding of the impact of other decision-makers with competing goals, and concern about the economic, social, and ethical implications of automated data analysis and decision-making. To address these challenges, FODSI brings together experts from many cognate academic disciplines, including computer science, statistics, mathematics, electrical engineering, and economics. Institute research outcomes have strong potential to directly impact the many application domains for data science in industry, commerce, science and society, facilitated by mechanisms that directly involve a stream of institute-trained personnel in industrial partners' projects, and by public activities designed to nurture substantive interactions between foundational and use-inspired research communities in data science. The institute also aims to educate and mentor future leaders in data science, through the further development of a pioneering undergraduate program in data science, and by training a diverse cohort of graduate students and postdocs with an innovative approach that emphasizes strong mentorship, flexibility, and breadth of collaboration opportunities. In addition, the institute plans to host an annual summer school that will deliver core curriculum and a taste of foundational research to a diverse group of advanced undergraduates, graduate students, and postdocs. It aims to broaden participation and increase diversity in the data science workforce, bringing the excitement of data science to under-represented groups at the high school level, and targeting diverse participation in the institute's public activities. And it will act as a nexus for research and education in the foundations of data science, by convening public events, such as summer schools and research workshops and other collaborative research opportunities, and by providing models for education, human resource development, and broadening participation. The scientific focus of the institute will encompass the full range of issues that arise in data science -- modeling issues, inferential issues, computational issues, and societal issues – and the challenges that emerge from the conflicts between their competing requirements. Its research agenda is organized around eight themes. Three of these themes focus on key challenges arising from the rich variety of interactions between a decision maker and its environment, not only the classical view of data that is processed in a batch or a stream, but also sequential interactions with feedback (the control perspective), experimental interactions designed to answer "what if" questions (the causality perspective), and strategic interactions involving other actors with conflicting goals (the economic perspective). The other research themes focus on opportunities for major impacts across disciplinary boundaries: on elucidating the algorithmic landscape of statistical problems, and in particular the computational complexity of statistical estimation problems, on sketching, sampling, and sub-linear time algorithms designed to address issues of scalability in data science problems; on exploiting statistical methodology in the service of algorithms; and on using breakthroughs in applied mathematics to address computational and inferential challenges. Intellectual contributions to societal issues in data science will feature throughout this set of themes. The institute will exploit strong connections with its scientific and industrial partners to ensure that these research directions enjoy a rich engagement with a broad range of commercial, technological and scientific application domains. Its sequence of research workshops and a collaborative research program will serve the broader research community by nurturing additional research in these key challenge areas. The institute will be led by a steering committee that will seek the help of an external advisory board to prioritize its research themes and activities throughout its lifetime. Its educational programs will include curriculum development from K-12 through undergraduate, a graduate level visit program, and a postdoc training model, aimed at empowering the next generation of leaders to fluidly work across conventional disciplinary boundaries while being mindful of the full set of scientific issues. The institute will undertake a multi-pronged effort to recruit, engage and support the full range of groups traditionally under-represented in mathematics, computer science and statistics.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
数据科学基础研究所(FODSI)汇集了来自加州大学伯克利分校、麻省理工学院、波士顿大学、布林茅尔学院、哈佛大学、霍华德大学和东北大学的庞大而多样化的研究人员和教育工作者团队,旨在推进数据科学领域的理论基础。数据科学已经成为21世纪的一门核心科学,它是一门广泛应用的科学和技术方法,利用数据可用性的爆炸式增长,以前所未有的规模和范围进行实证调查。它现在在所有科学、商业和工业的各个领域发挥着核心作用。数据科学原则方法的理论基础的发展尤其具有挑战性,因为它需要在丰富而复杂的过程中出现的科学问题的全部广度上取得进展,通过这些过程可以使用数据来做出决策。这些问题包括数据分析目标的规范,旨在捕捉数据可能产生方式的模型的开发,响应模型和目标的算法的制作,对这些模型和目标的错误规范的影响的理解,对影响数据和结果解释的交互、干预和反馈机制的影响的理解,关注这些结果的不确定性,理解其他具有竞争目标的决策者的影响,关注自动化数据分析和决策的经济、社会和伦理含义。为了应对这些挑战,FODSI汇集了来自许多相关学科的专家,包括计算机科学、统计学、数学、电气工程和经济学。研究所的研究成果具有强大的潜力,可以直接影响工业、商业、科学和社会中数据科学的许多应用领域,通过直接涉及工业合作伙伴项目中研究所培训的人员流的机制,以及旨在培养数据科学基础研究社区和使用启发研究社区之间实质性互动的公共活动来促进。该研究所还旨在通过进一步发展开创性的数据科学本科课程,并通过创新的方法培养多样化的研究生和博士后,以强调强大的指导、灵活性和广泛的合作机会,教育和指导数据科学领域的未来领导者。此外,该研究所还计划每年举办暑期学校,向高级本科生、研究生和博士后等不同群体提供核心课程和基础研究的体验。它的目标是扩大参与和增加数据科学劳动力的多样性,将数据科学的兴奋带给高中水平的代表性不足的群体,并以多样化参与研究所的公共活动为目标。它将作为数据科学基础研究和教育的纽带,通过召集公共活动,如暑期学校、研究研讨会和其他合作研究机会,并为教育、人力资源开发和扩大参与提供模式。该研究所的科学重点将包括数据科学中出现的所有问题——建模问题、推理问题、计算问题和社会问题——以及从它们相互竞争的需求之间的冲突中出现的挑战。其研究议程围绕八个主题组织。其中三个主题侧重于决策者与其环境之间丰富多样的相互作用所带来的关键挑战,不仅是在批处理或流中处理的数据的经典观点,而且还有与反馈的顺序相互作用(控制观点),旨在回答“假设”问题的实验相互作用(因果关系观点),以及涉及具有冲突目标的其他参与者的战略相互作用(经济观点)。其他研究主题侧重于跨越学科边界的重大影响的机会:阐明统计问题的算法景观,特别是统计估计问题的计算复杂性,设计用于解决数据科学问题中的可扩展性问题的草图,采样和亚线性时间算法;论统计方法在算法服务中的应用并利用应用数学的突破来解决计算和推理方面的挑战。对数据科学中社会问题的智力贡献将贯穿这一系列主题。该研究所将利用其与科学和工业合作伙伴的紧密联系,以确保这些研究方向与广泛的商业,技术和科学应用领域进行丰富的接触。它的一系列研究研讨会和合作研究计划将通过在这些关键挑战领域培养更多的研究来服务于更广泛的研究界。该研究所将由一个指导委员会领导,该委员会将寻求外部咨询委员会的帮助,以确定其研究主题和活动的优先顺序。它的教育项目将包括从K-12到本科的课程开发、研究生水平的访问项目和博士后培训模式,旨在使下一代领导者能够跨越传统学科界限流畅地工作,同时关注全套科学问题。该研究所将采取多管齐下的努力,招募、参与和支持传统上在数学、计算机科学和统计学领域代表性不足的所有群体。该奖项反映了美国国家科学基金会的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。

项目成果

期刊论文数量(66)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Bures-Wasserstein Barycenters and Low-Rank Matrix Recovery
  • DOI:
  • 发表时间:
    2022-10
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Tyler Maunu;Thibaut Le Gouic;P. Rigollet
  • 通讯作者:
    Tyler Maunu;Thibaut Le Gouic;P. Rigollet
Variational inference via Wasserstein gradient flows
  • DOI:
    10.48550/arxiv.2205.15902
  • 发表时间:
    2022-05
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Marc Lambert;Sinho Chewi;F. Bach;S. Bonnabel;P. Rigollet
  • 通讯作者:
    Marc Lambert;Sinho Chewi;F. Bach;S. Bonnabel;P. Rigollet
Online Page Migration with ML Advice
  • DOI:
  • 发表时间:
    2020-06
  • 期刊:
  • 影响因子:
    0
  • 作者:
    P. Indyk;Frederik Mallmann-Trenn;Slobodan Mitrovi'c;R. Rubinfeld
  • 通讯作者:
    P. Indyk;Frederik Mallmann-Trenn;Slobodan Mitrovi'c;R. Rubinfeld
Private High-Dimensional Hypothesis Testing
私人高维假设检验
Can Direct Latent Model Learning Solve Linear Quadratic Gaussian Control?
  • DOI:
    10.48550/arxiv.2212.14511
  • 发表时间:
    2022-12
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Yi Tian;K. Zhang;Russ Tedrake;S. Sra
  • 通讯作者:
    Yi Tian;K. Zhang;Russ Tedrake;S. Sra
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Piotr Indyk其他文献

Differentially Private Approximate Near Neighbor Counting in High Dimensions
高维差分隐私近似近邻计数
  • DOI:
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Alexandr Andoni;Piotr Indyk;S. Mahabadi;Shyam Narayanan
  • 通讯作者:
    Shyam Narayanan
Dimension-Accuracy Tradeoffs in Contrastive Embeddings for Triplets, Terminals & Top-k Nearest Neighbors
三元组、终端对比嵌入的尺寸精度权衡
  • DOI:
    10.48550/arxiv.2312.13490
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Vaggos Chatziafratis;Piotr Indyk
  • 通讯作者:
    Piotr Indyk

Piotr Indyk的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Piotr Indyk', 18)}}的其他基金

Travel: SODA 2024 Conference Student and Postdoc Travel Support
旅行:SODA 2024 会议学生和博士后旅行支持
  • 批准号:
    2343779
  • 财政年份:
    2023
  • 资助金额:
    $ 549.03万
  • 项目类别:
    Standard Grant
Conference: SODA 2023 Conference Student and Postdoc Travel Support
会议:SODA 2023 会议学生和博士后旅行支持
  • 批准号:
    2232958
  • 财政年份:
    2022
  • 资助金额:
    $ 549.03万
  • 项目类别:
    Standard Grant
Collaborative Research: AF: Small: Fine-Grained Complexity of Approximate Problems
协作研究:AF:小:近似问题的细粒度复杂性
  • 批准号:
    2006798
  • 财政年份:
    2020
  • 资助金额:
    $ 549.03万
  • 项目类别:
    Standard Grant
TRIPODS: Institute for Foundations of Data Science (IFDS)
TRIPODS:数据科学研究所 (IFDS)
  • 批准号:
    1740751
  • 财政年份:
    2017
  • 资助金额:
    $ 549.03万
  • 项目类别:
    Continuing Grant
AitF: FULL: Sparse Fourier Transform: From Theory to Practice
AitF:FULL:稀疏傅里叶变换:从理论到实践
  • 批准号:
    1535851
  • 财政年份:
    2015
  • 资助金额:
    $ 549.03万
  • 项目类别:
    Standard Grant
BIGDATA: F: DKA: Collaborative Research: Structured Nearest Neighbor Search in High Dimensions
BIGDATA:F:DKA:协作研究:高维结构化最近邻搜索
  • 批准号:
    1447476
  • 财政年份:
    2015
  • 资助金额:
    $ 549.03万
  • 项目类别:
    Standard Grant
AF: Large: Collaborative Research: Compact Representations and Efficient Algorithms for Distributed Geometric Data
AF:大型:协作研究:分布式几何数据的紧凑表示和高效算法
  • 批准号:
    1012042
  • 财政年份:
    2010
  • 资助金额:
    $ 549.03万
  • 项目类别:
    Standard Grant
Fast Approximate Algorithms for Wireless Sensor Networks
无线传感器网络的快速近似算法
  • 批准号:
    0728645
  • 财政年份:
    2007
  • 资助金额:
    $ 549.03万
  • 项目类别:
    Standard Grant
CAREER: Approximate Algorithms for High-dimensional Geometric Problems
职业:高维几何问题的近似算法
  • 批准号:
    0133849
  • 财政年份:
    2002
  • 资助金额:
    $ 549.03万
  • 项目类别:
    Continuing Grant

相似国自然基金

Scalable Learning and Optimization: High-dimensional Models and Online Decision-Making Strategies for Big Data Analysis
  • 批准号:
  • 批准年份:
    2024
  • 资助金额:
    万元
  • 项目类别:
    合作创新研究团队
Data-driven Recommendation System Construction of an Online Medical Platform Based on the Fusion of Information
  • 批准号:
  • 批准年份:
    2024
  • 资助金额:
    万元
  • 项目类别:
    外国青年学者研究基金项目
Development of a Linear Stochastic Model for Wind Field Reconstruction from Limited Measurement Data
  • 批准号:
  • 批准年份:
    2020
  • 资助金额:
    40 万元
  • 项目类别:
基于Linked Open Data的Web服务语义互操作关键技术
  • 批准号:
    61373035
  • 批准年份:
    2013
  • 资助金额:
    77.0 万元
  • 项目类别:
    面上项目
Molecular Interaction Reconstruction of Rheumatoid Arthritis Therapies Using Clinical Data
  • 批准号:
    31070748
  • 批准年份:
    2010
  • 资助金额:
    34.0 万元
  • 项目类别:
    面上项目
高维数据的函数型数据(functional data)分析方法
  • 批准号:
    11001084
  • 批准年份:
    2010
  • 资助金额:
    16.0 万元
  • 项目类别:
    青年科学基金项目
染色体复制负调控因子datA在细胞周期中的作用
  • 批准号:
    31060015
  • 批准年份:
    2010
  • 资助金额:
    25.0 万元
  • 项目类别:
    地区科学基金项目
Computational Methods for Analyzing Toponome Data
  • 批准号:
    60601030
  • 批准年份:
    2006
  • 资助金额:
    17.0 万元
  • 项目类别:
    青年科学基金项目

相似海外基金

Conference: Statistical Foundations of Data Science and their Applications
会议:数据科学的统计基础及其应用
  • 批准号:
    2304646
  • 财政年份:
    2023
  • 资助金额:
    $ 549.03万
  • 项目类别:
    Standard Grant
Natural Science Transfer Scholars: Natural Science Foundations for Innovation in the Data-Driven Economy
自然科学转移学者:数据驱动经济创新的自然科学基础
  • 批准号:
    2221177
  • 财政年份:
    2023
  • 资助金额:
    $ 549.03万
  • 项目类别:
    Standard Grant
Collaborative Research: AF: Small: RUI: Data Science from Economic Foundations
合作研究:AF:小型:RUI:来自经济基础的数据科学
  • 批准号:
    2218814
  • 财政年份:
    2022
  • 资助金额:
    $ 549.03万
  • 项目类别:
    Standard Grant
Collaborative Research: Fostering Virtual Learning of Data Science Foundations with Mathematical Logic for Rural High School Students
协作研究:促进农村高中生数据科学基础与数学逻辑的虚拟学习
  • 批准号:
    2201394
  • 财政年份:
    2022
  • 资助金额:
    $ 549.03万
  • 项目类别:
    Continuing Grant
Collaborative Research: Fostering Virtual Learning of Data Science Foundations with Mathematical Logic for Rural High School Students
协作研究:促进农村高中生数据科学基础与数学逻辑的虚拟学习
  • 批准号:
    2201393
  • 财政年份:
    2022
  • 资助金额:
    $ 549.03万
  • 项目类别:
    Continuing Grant
CIF: Small: Foundations of Decentralized Data Science: Optimizing Utility, Privacy and Communication Efficiency
CIF:小型:去中心化数据科学的基础:优化实用性、隐私和通信效率
  • 批准号:
    2213223
  • 财政年份:
    2022
  • 资助金额:
    $ 549.03万
  • 项目类别:
    Standard Grant
Collaborative Research: AF: Small: RUI: Data Science from Economic Foundations
合作研究:AF:小型:RUI:来自经济基础的数据科学
  • 批准号:
    2218813
  • 财政年份:
    2022
  • 资助金额:
    $ 549.03万
  • 项目类别:
    Standard Grant
TRIPODS: Institute for Foundations of Data Science
TRIPODS:数据科学研究所
  • 批准号:
    2023109
  • 财政年份:
    2020
  • 资助金额:
    $ 549.03万
  • 项目类别:
    Continuing Grant
TRIPODS: Institute for Foundations of Data Science
TRIPODS:数据科学研究所
  • 批准号:
    2023239
  • 财政年份:
    2020
  • 资助金额:
    $ 549.03万
  • 项目类别:
    Continuing Grant
TRIPODS: Institute for Foundations of Data Science
TRIPODS:数据科学研究所
  • 批准号:
    2023495
  • 财政年份:
    2020
  • 资助金额:
    $ 549.03万
  • 项目类别:
    Continuing Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了