HDR TRIPODS: Institute for Integrated Data Science: A Transdisciplinary Approach to Understanding Fundamental Trade-offs and Theoretical Foundations

HDR TRIPODS:综合数据科学研究所:理解基本权衡和理论基础的跨学科方法

基本信息

  • 批准号:
    1934846
  • 负责人:
  • 金额:
    $ 150万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Continuing Grant
  • 财政年份:
    2019
  • 资助国家:
    美国
  • 起止时间:
    2019-10-01 至 2024-09-30
  • 项目状态:
    已结题

项目摘要

Many areas of science, engineering, and industry are already being revolutionized by the adoption of tools and techniques from data science. However, a rigorous analysis of existing approaches together with the development of new ideas is necessary to a) ensure the optimal use of available computational and statistical resources and b) develop a principled and systematic approach to the relevant problems rather than relying on a collection of ad hoc solutions. In particular, there are many interrelated questions that arise in a typical data science project. First is the acquisition of relevant data: Can data be collected interactively and might this reduce the costs of data acquisition? Is the data noisy and how might this impact the results? Second is the processing of data: If the data cannot fit in the memory of a single machine, how can we minimize the communication costs within a cluster of machines? When are approximate answers sufficient and how does the required accuracy trade off with the computational resources available? Third is the prediction value of the available data: Can the uncertainty of the final results be quantified? How can the modeling assumptions used by our algorithms be efficiently evaluated? This award supports a data science institute with the main goal of developing an understanding of the fundamental mathematical and computational issues underlying the aforementioned questions. Ultimately, this will enable practitioners to make more informed decisions when investing time and money across the life cycle of their data science project. Achieving this goal necessitates a transdisciplinary approach and the team of investigators includes experts in theoretical computer science; applied and computational mathematics; machine learning and statistics; and coding and information theory. In addition to pursuing the above research goals, the institute will coordinate education and training activities and develop resources for the research community.Specific research goals explored in this project include: 1) Understanding the trade-off between rounds of interactive data acquisition and statistical and computational efficiency. 2) Minimizing query complexity in interactive unsupervised learning problems. 3) Understanding space/sample complexity trade-offs when processing stochastic data. 4) Developing fine-grained approximation algorithms relevant to core data science tasks. 5) Using coding theory to enable communication-efficient distributed machine learning. 6) Designing variational inference methods with statistical guarantees given limited resources. 7) Developing a principled approach to exploiting trade-offs between bias, model complexity, and computational budget. Specific institute activities include: 1) Technical workshops and training activities for researchers in domain sciences. 2) A virtual speaker series. 3) Education initiatives including the development of new courses that will teach foundational topics in data science and resources that can be used across different institutions. The grant will also train postdoctoral scholars and undergraduate researchers.This project is part of the National Science Foundation's Harnessing the Data Revolution (HDR) Big Idea activity.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
科学、工程和工业的许多领域已经通过采用数据科学的工具和技术而发生了革命性的变化。然而,有必要对现有方法进行严格分析,同时提出新的想法,以便:(a)确保最佳利用现有的计算和统计资源;(B)对相关问题制定有原则的系统方法,而不是依赖于一系列临时解决办法。特别是,在典型的数据科学项目中会出现许多相互关联的问题。首先是获取相关数据:能否以交互方式收集数据,这是否会降低数据获取的成本?数据是否存在噪声,这会如何影响结果?其次是数据的处理:如果数据不能容纳在单个机器的内存中,我们如何最大限度地减少机器集群内的通信成本?什么时候近似答案就足够了,所需的精度如何与可用的计算资源进行权衡?三是可用数据的预测价值:最终结果的不确定性能否量化?如何有效地评估我们的算法所使用的建模假设?该奖项支持一个数据科学研究所,其主要目标是了解上述问题背后的基本数学和计算问题。最终,这将使从业者在其数据科学项目的整个生命周期中投入时间和金钱时做出更明智的决策。实现这一目标需要跨学科的方法,研究团队包括理论计算机科学专家;应用和计算数学;机器学习和统计;以及编码和信息理论。除上述研究目标外,研究所还将协调教育和培训活动,并为研究界开发资源。本项目探讨的具体研究目标包括:1)了解交互式数据采集与统计和计算效率之间的权衡。2)交互式无监督学习问题中查询复杂度最小化。3)理解处理随机数据时的空间/样本复杂性权衡。4)开发与核心数据科学任务相关的细粒度近似算法。5)使用编码理论实现高效通信的分布式机器学习。6)在资源有限的情况下,设计具有统计保证的变分推理方法。7)开发一种原则性的方法来利用偏差,模型复杂性和计算预算之间的权衡。研究所的具体活动包括:1)为领域科学研究人员举办技术讲习班和培训活动。2)虚拟扬声器系列。3)教育计划,包括开发新课程,教授数据科学的基础主题和可在不同机构使用的资源。该项目是美国国家科学基金会利用数据革命(HDR)大创意活动的一部分。该奖项反映了NSF的法定使命,并通过使用基金会的智力价值和更广泛的影响审查标准进行评估,被认为值得支持。

项目成果

期刊论文数量(54)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Model-based identification of conditionally-essential genes from transposon-insertion sequencing data.
  • DOI:
    10.1371/journal.pcbi.1009273
  • 发表时间:
    2022-03
  • 期刊:
  • 影响因子:
    4.3
  • 作者:
    Sarsani V;Aldikacti B;He S;Zeinert R;Chien P;Flaherty P
  • 通讯作者:
    Flaherty P
How Compression and Approximation Affect Efficiency in String Distance Measures
压缩和近似如何影响弦距离测量的效率
Graph Reconstruction from Random Subgraphs
从随机子图重建图
  • DOI:
  • 发表时间:
    2022
  • 期刊:
  • 影响因子:
    0
  • 作者:
    McGregor, Andrew;Sengupta, Rik
  • 通讯作者:
    Sengupta, Rik
Reliable Distributed Clustering with Redundant Data Assignment
具有冗余数据分配的可靠分布式集群
  • DOI:
  • 发表时间:
    2020
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Gandikota, Venkata;Mazumdar, Arya;Rawat, Ankit Singh
  • 通讯作者:
    Rawat, Ankit Singh
PredictRoute: A Network Path Prediction Toolkit
  • DOI:
    10.1145/3543516.3460107
  • 发表时间:
    2021-05
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Rachee Singh;D. Tench;Phillipa Gill;A. Mcgregor
  • 通讯作者:
    Rachee Singh;D. Tench;Phillipa Gill;A. Mcgregor
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Andrew McGregor其他文献

Graph Reconstruction from Noisy Random Subgraphs
从噪声随机子图重建图
  • DOI:
  • 发表时间:
    2024
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Andrew McGregor;Rik Sengupta
  • 通讯作者:
    Rik Sengupta
Improved Algorithms for Maximum Coverage in Dynamic and Random Order Streams
动态和随机顺序流中最大覆盖范围的改进算法
  • DOI:
    10.48550/arxiv.2403.14087
  • 发表时间:
    2024
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Amit Chakrabarti;Andrew McGregor;Anthony Wirth
  • 通讯作者:
    Anthony Wirth
Producing knowledge about the sustainability and nutritional values of plant and animal-based beef: Funding, metrics, geographies and gaps
关于植物性和动物性牛肉的可持续性和营养价值的知识生产:资金、指标、地理区域和差距
  • DOI:
    10.1016/j.jclepro.2024.140900
  • 发表时间:
    2024-02-15
  • 期刊:
  • 影响因子:
    10.000
  • 作者:
    Andrew McGregor;Milena Bojovic;Nadine Ghammachi;Seema Mihrshahi
  • 通讯作者:
    Seema Mihrshahi
Historical Agrarian Change and its Connections to Contemporary Agricultural Extension in Northwest Cambodia
柬埔寨西北部的历史土地变迁及其与当代农业推广的联系
  • DOI:
  • 发表时间:
    2024
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Brian R. Cook;Paula Satizábal;Van Touch;Andrew McGregor;J. Diepart;Ariane Utomo;Nicholas Harrigan;Katharine McKinnon;Pao Srean;T. Tran;Andrea Babon
  • 通讯作者:
    Andrea Babon
Disease Characteristics and Outcomes of Non-Melanoma Skin Cancers in Myeloproliferative Neoplasm (MPN) Patients Treated with Ruxolitinib
  • DOI:
    10.1182/blood-2022-162417
  • 发表时间:
    2022-11-15
  • 期刊:
  • 影响因子:
  • 作者:
    Alexandros Rampotas;Luke Carter-Brzezinski;Tim C.P Somervaille;James Forryan;Bethan Psaila;Adam J Mead;Mamta Garg;Heather Laing;Louise Wallis;Nauman M Butt;Conal McConville;Ali Sahra;Andrew McGregor;Hannah Cowan;Andrew J. Innes;Joanne Ewing;Matthew Carter;Peter Dyer;Chun Huat Teh;Sebastian Francis
  • 通讯作者:
    Sebastian Francis

Andrew McGregor的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Andrew McGregor', 18)}}的其他基金

AF: Small: Collaborative Research: New Challenges in Graph Stream Algorithms and Related Communication Games
AF:小:协作研究:图流算法和相关通信游戏的新挑战
  • 批准号:
    1908849
  • 财政年份:
    2019
  • 资助金额:
    $ 150万
  • 项目类别:
    Standard Grant
AitF: Efficient Memory Management via Randomized, Streaming, and Online Algorithms
AitF:通过随机、流式和在线算法进行高效内存管理
  • 批准号:
    1637536
  • 财政年份:
    2016
  • 资助金额:
    $ 150万
  • 项目类别:
    Standard Grant
BIGDATA: Small: DA: Collaborative Research: From Data To Users: Providing Interpretable and Verifiable Explanations in Data Mining
BIGDATA:小:DA:协作研究:从数据到用户:在数据挖掘中提供可解释和可验证的解释
  • 批准号:
    1251110
  • 财政年份:
    2013
  • 资助金额:
    $ 150万
  • 项目类别:
    Standard Grant
AF: Small: Massive Graph Analysis via Linear Measurements: Towards a Theory of Homomorphic Co
AF:小:通过线性测量进行大规模图分析:走向同态 Co 理论
  • 批准号:
    1320719
  • 财政年份:
    2013
  • 资助金额:
    $ 150万
  • 项目类别:
    Standard Grant
CAREER: New Directions for Sketching and Stream Computation
职业:草图绘制和流计算的新方向
  • 批准号:
    0953754
  • 财政年份:
    2010
  • 资助金额:
    $ 150万
  • 项目类别:
    Continuing Grant

相似海外基金

HDR TRIPODS: Collaborative Research: Institute for Data, Econometrics, Algorithms and Learning
HDR TRIPODS:协作研究:数据、计量经济学、算法和学习研究所
  • 批准号:
    1934813
  • 财政年份:
    2019
  • 资助金额:
    $ 150万
  • 项目类别:
    Standard Grant
HDR TRIPODS: UIC Foundations of Data Science Institute
HDR TRIPODS:UIC 数据科学研究所基础
  • 批准号:
    1934915
  • 财政年份:
    2019
  • 资助金额:
    $ 150万
  • 项目类别:
    Continuing Grant
HDR TRIPODS: Collaborative Research: Institute for Data, Econometrics, Algorithms and Learning
HDR TRIPODS:协作研究:数据、计量经济学、算法和学习研究所
  • 批准号:
    1934931
  • 财政年份:
    2019
  • 资助金额:
    $ 150万
  • 项目类别:
    Standard Grant
HDR TRIPODS: UT Austin Institute on the Foundations of Data Science
HDR TRIPODS:UT Austin 数据科学基础研究所
  • 批准号:
    1934932
  • 财政年份:
    2019
  • 资助金额:
    $ 150万
  • 项目类别:
    Continuing Grant
HDR TRIPODS: UC Davis TETRAPODS Institute of Data Science
HDR TRIPODS:加州大学戴维斯分校 TETRAPODS 数据科学研究所
  • 批准号:
    1934568
  • 财政年份:
    2019
  • 资助金额:
    $ 150万
  • 项目类别:
    Continuing Grant
HDR TRIPODS: Collaborative Research: Institute for Data, Econometrics, Algorithms and Learning
HDR TRIPODS:协作研究:数据、计量经济学、算法和学习研究所
  • 批准号:
    1934843
  • 财政年份:
    2019
  • 资助金额:
    $ 150万
  • 项目类别:
    Continuing Grant
HDR TRIPODS: D4 (Dependable Data-Driven Discovery) Institute
HDR TRIPODS:D4(可靠数据驱动的发现)研究所
  • 批准号:
    1934884
  • 财政年份:
    2019
  • 资助金额:
    $ 150万
  • 项目类别:
    Continuing Grant
HDR TRIPODS: Illinois Institute for Data Science and Dynamical Systems (iDS2)
HDR TRIPODS:伊利诺伊州数据科学与动力系统研究所 (iDS2)
  • 批准号:
    1934986
  • 财政年份:
    2019
  • 资助金额:
    $ 150万
  • 项目类别:
    Continuing Grant
HDR TRIPODS: Penn Institute for Foundations of Data Science
HDR TRIPODS:宾夕法尼亚大学数据科学研究所
  • 批准号:
    1934876
  • 财政年份:
    2019
  • 资助金额:
    $ 150万
  • 项目类别:
    Continuing Grant
HDR Tripods: Texas A&M Research Institute for Foundations of Interdisciplinary Data Science (FIDS)
HDR 三脚架:德克萨斯 A
  • 批准号:
    1934904
  • 财政年份:
    2019
  • 资助金额:
    $ 150万
  • 项目类别:
    Continuing Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了