EAGER: Scaling Up Machine Learning with Virtual Memory

EAGER:利用虚拟内存扩展机器学习

基本信息

  • 批准号:
    1551614
  • 负责人:
  • 金额:
    $ 18.49万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2015
  • 资助国家:
    美国
  • 起止时间:
    2015-10-01 至 2017-09-30
  • 项目状态:
    已结题

项目摘要

Large datasets in terabytes or petabytes are increasingly common, calling for new kinds of scalable machine learning approaches. While state-of-the-art techniques often use complex designs, specialized methods to store and work with large datasets, this project proposes a minimalist approach that forgoes such complexities, by leveraging the fundamental virtual memory capability found on all modern operating systems, to load into the virtual memory space the large datasets that are otherwise too large to fit in the computer's main memory. This main idea will allow developers to easily work with large datasets as if they were in-memory data, enabling them to create machine learning software that is significantly easier to develop and maintain, yet faster and more scalable. Developers will achieve higher work efficiency and make fewer programming errors; companies will reduce operating costs; and researchers will innovate methodology without getting bogged down by implementation details and scalability concerns. The proposed ideas could make a far-reaching impact on industry and academia, in science, education, and technology, as they face increasing challenges in applying machine learning on large datasets. The proposed ideas will also help train the next generation of scientists and engineers by allowing students to learn to work with large datasets in significantly simpler ways. As virtual memory is universally available on modern devices and operating systems, the proposed ideas will also work on mobile, low-power devices, enabling them to perform computation at unprecedented scales and speed.This project investigates a fundamental, radical way to scale up machine learning algorithms based on virtual memory, one that may be easier to code and maintain, but currently under-utilized in by both single-machine and multi-machine distributed approaches. This research aims to develop deep understanding of this radical idea, its benefits and limitations, and to what extent these results apply in various settings, with respect to datasets, memory sizes, page sizes (e.g., from the default 4KB to the jumbo 2MB pages that enable terabyes of virtual memory space), and architectures (e.g., testing on distributed shared memory file systems like Lustre that support paging and virtual memory over large computer clusters). The researchers will build on their preliminary work on graph algorithms that already demonstrates significant speed-up over state-of-the-art approaches; they will extend their approach to a wide range of machine learning and data mining algorithms. They will also develop mathematical models and systematic approaches to profile and predict algorithm performance and energy usage based on extensive evaluation across platforms, datasets, and languages. For further information, see the project web site at: http://poloclub.gatech.edu/mmap/.
TB或PB级的大型数据集越来越常见,这就需要新型的可扩展机器学习方法。虽然最先进的技术通常使用复杂的设计,专门的方法来存储和处理大型数据集,但该项目提出了一种最低限度的方法,通过利用所有现代操作系统上的基本虚拟内存功能,将大型数据集加载到虚拟内存空间中,否则这些数据集太大,无法容纳在计算机的主内存中。这一主要思想将使开发人员能够轻松地处理大型数据集,就像它们是内存中的数据一样,使他们能够创建更容易开发和维护的机器学习软件,而且速度更快,可扩展性更强。开发人员将实现更高的工作效率,并减少编程错误;公司将降低运营成本;研究人员将创新方法,而不会陷入实现细节和可扩展性问题。提出的想法可能会对科学,教育和技术领域的工业和学术界产生深远的影响,因为他们在大型数据集上应用机器学习时面临越来越多的挑战。提出的想法还将有助于培养下一代科学家和工程师,让学生学习以更简单的方式处理大型数据集。由于虚拟内存在现代设备和操作系统上普遍可用,因此所提出的想法也将适用于移动的低功耗设备,使它们能够以前所未有的规模和速度执行计算。该项目研究了一种基本的,激进的方法来扩展基于虚拟内存的机器学习算法,这种方法可能更容易编码和维护,但是目前在单机和多机分布式方法中利用不足。这项研究旨在深入了解这个激进的想法,它的好处和局限性,以及这些结果在多大程度上适用于各种设置,关于数据集,内存大小,页面大小(例如,从默认的4KB到实现兆兆字节虚拟存储器空间的巨型2 MB页面),以及体系结构(例如,在分布式共享内存文件系统(如Lustre,支持大型计算机集群上的分页和虚拟内存)上进行测试。研究人员将建立在他们对图形算法的初步研究基础上,这些算法已经证明了比最先进的方法有显著的速度提高;他们将把他们的方法扩展到广泛的机器学习和数据挖掘算法。他们还将开发数学模型和系统方法,以基于跨平台,数据集和语言的广泛评估来分析和预测算法性能和能源使用。欲了解更多信息,请访问项目网站:http://poloclub.gatech.edu/mmap/。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Duen Horng Chau其他文献

TgrApp: Anomaly Detection and Visualization of Large-Scale Call Graphs
TgrApp:大规模调用图的异常检测和可视化
  • DOI:
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    0
  • 作者:
    M. Cazzolato;Saranya Vijayakumar;Xinyi Zheng;Namyong Park;Meng;Duen Horng Chau;Pedro Fidalgo;Bruno Lages;A. Traina;C. Faloutsos
  • 通讯作者:
    C. Faloutsos
Visual Exploration of Literature with Argo Scholar
与Argo Scholar一起进行文学视觉探索
Mining large graphs: Algorithms, inference, and discoveries
挖掘大图:算法、推理和发现
STEPS: A Spatio-temporal Electric Power Systems Visualization
STEPS:时空电力系统可视化
  • DOI:
  • 发表时间:
    2016
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Robert S. Pienta;Leilei Xiong;S. Grijalva;Duen Horng Chau;Minsuk Kahng
  • 通讯作者:
    Minsuk Kahng
TopicScape: Semantic Navigation of Document Collections
TopicScape:文档集合的语义导航
  • DOI:
  • 发表时间:
    2011
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Jacob Eisenstein;Duen Horng Chau;A. Kittur;E. Xing
  • 通讯作者:
    E. Xing

Duen Horng Chau的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Duen Horng Chau', 18)}}的其他基金

SaTC: CORE: Medium: Understanding and Fortifying Machine Learning Based Security Analytics
SaTC:核心:媒介:理解和强化基于机器学习的安全分析
  • 批准号:
    1704701
  • 财政年份:
    2017
  • 资助金额:
    $ 18.49万
  • 项目类别:
    Continuing Grant
EAGER: SSDIM: Leveraging Point Processes and Mean Field Games Theory for Simulating Data on Interdependent Critical Infrastructures
EAGER:SSDIM:利用点过程和平均场博弈论来模拟相互依赖的关键基础设施上的数据
  • 批准号:
    1745382
  • 财政年份:
    2017
  • 资助金额:
    $ 18.49万
  • 项目类别:
    Standard Grant
EAGER: Asynchronous Event Models for State-Topology Co-Evolution of Temporal Networks
EAGER:时态网络状态拓扑协同演化的异步事件模型
  • 批准号:
    1639792
  • 财政年份:
    2016
  • 资助金额:
    $ 18.49万
  • 项目类别:
    Standard Grant
III: Medium: Collaborative Research: Human-Computer Graph Exploration and Tele-Discovery
III:媒介:协作研究:人机图探索与远程发现
  • 批准号:
    1563816
  • 财政年份:
    2016
  • 资助金额:
    $ 18.49万
  • 项目类别:
    Continuing Grant
TWC: Small: Collaborative: Cracking Down Online Deception Ecosystems
TWC:小型:协作:打击在线欺骗生态系统
  • 批准号:
    1526254
  • 财政年份:
    2015
  • 资助金额:
    $ 18.49万
  • 项目类别:
    Standard Grant

相似海外基金

Scaling-Up plant based Nanocarriers for BIOpharmaceuticals (SUNBIO)
用于生物制药的植物纳米载体的放大(SUNBIO)
  • 批准号:
    EP/Z53304X/1
  • 财政年份:
    2024
  • 资助金额:
    $ 18.49万
  • 项目类别:
    Research Grant
Scaling-up co-designed adolescent mental health interventions
扩大共同设计的青少年心理健康干预措施
  • 批准号:
    MR/Y020286/1
  • 财政年份:
    2024
  • 资助金额:
    $ 18.49万
  • 项目类别:
    Fellowship
Scaling up plant-protein based coatings for food packaging
扩大用于食品包装的植物蛋白基涂料
  • 批准号:
    10109386
  • 财政年份:
    2024
  • 资助金额:
    $ 18.49万
  • 项目类别:
    Launchpad
Scaling Up Point Of Care Testing And Linkages To Care For Syphilis And HIV In Rural, Remote, And Indigenous Populations In Central Alberta
扩大艾伯塔省中部农村、偏远地区和原住民的护理点检测和联系,以治疗梅毒和艾滋病毒
  • 批准号:
    502790
  • 财政年份:
    2024
  • 资助金额:
    $ 18.49万
  • 项目类别:
    Directed Grant
URBAN RETROFIT UK: Scaling up place-based adaptations to the built environment through planning and development systems
英国城市改造:通过规划和开发系统扩大对建筑环境的基于地点的适应
  • 批准号:
    ES/Z502728/1
  • 财政年份:
    2024
  • 资助金额:
    $ 18.49万
  • 项目类别:
    Research Grant
Postdoctoral Fellowship: OCE-PRF: Scaling up herbivore holobiont physiology from genes to populations across a temperate upwelling gradient
博士后奖学金:OCE-PRF:跨温带上升流梯度将食草动物全生物生理学从基因扩展到种群
  • 批准号:
    2308398
  • 财政年份:
    2024
  • 资助金额:
    $ 18.49万
  • 项目类别:
    Standard Grant
Scaling Up our Well-bean Machine
扩大我们的优质豆机规模
  • 批准号:
    10053959
  • 财政年份:
    2023
  • 资助金额:
    $ 18.49万
  • 项目类别:
    Small Business Research Initiative
Up-scaling solar hydrogen production
扩大太阳能制氢规模
  • 批准号:
    EP/W033216/1
  • 财政年份:
    2023
  • 资助金额:
    $ 18.49万
  • 项目类别:
    Research Grant
Scaling up Treekind(R) - a truly sustainable vegan leather alternative, completely free of plastic polyurethane
Scaling up Treekind(R) - 真正可持续的纯素皮革替代品,完全不含塑料聚氨酯
  • 批准号:
    10081776
  • 财政年份:
    2023
  • 资助金额:
    $ 18.49万
  • 项目类别:
    Collaborative R&D
Scaling up a novel low-emission fungal fermentation-based production system to commercialise ultra-realistic meat whole-cuts alternatives
扩大基于真菌发酵的新型低排放生产系统,以实现超现实肉类全切替代品的商业化
  • 批准号:
    10076671
  • 财政年份:
    2023
  • 资助金额:
    $ 18.49万
  • 项目类别:
    Collaborative R&D
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了