Model-Parallel Collaborative Filtering in Apache Spark

Apache Spark 中的模型并行协同过滤

基本信息

  • 批准号:
    1555772
  • 负责人:
  • 金额:
    $ 6.88万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2015
  • 资助国家:
    美国
  • 起止时间:
    2015-09-01 至 2016-08-31
  • 项目状态:
    已结题

项目摘要

With data rapidly growing in size and complexity, many organizations are eager to train collaborative filtering methods on massive datasets using distributed computing environments. For instance, Netflix has hundreds of thousands of online programs to recommend to its millions of users, and Facebook has millions of users who could potentially form new links between one another. However, leading methods introduce significant algorithmic challenges in the distributed setting. The PI proposes to study a novel algorithm designed to be efficient for large-scale data science applications. Preliminary studies demonstrate the promise of this method, and the PI proposes to formally characterize the algorithm's behavior, perform an extensive empirical evaluation, and incorporate ideas inspired by this proposal into an upcoming online course PI will be teaching.Collaborative filtering, and in particular matrix factorization, is a widely used method for devising recommender systems. However, the size of these models grows linearly with the number of users and items, and leading methods for matrix factorization introduce significant challenges in the distributed setting due to their high communication costs. The PI proposes to study a novel model-parallel algorithm designed for Apache Spark that leverages the sparsity of the underlying data to drastically reduce this communication burden. Preliminary studies demonstrate the promise of this method, and the PI proposes to formally characterize the algorithm's behavior, perform an extensive empirical evaluation, and explore the paradigm of model-parallelism in Spark more generally for other learning settings. The PI will also incorporate ideas related to model-parallelism inspired by this proposal into an upcoming MOOC that be taught on the edX platform.
随着数据的规模和复杂性的快速增长,许多组织都渴望使用分布式计算环境来训练针对海量数据集的协作过滤方法。例如,Netflix有数十万个在线节目要推荐给它的数百万用户,Facebook有数百万用户可能会在彼此之间形成新的链接。然而,领先的方法在分布式环境中引入了重大的算法挑战。PI建议研究一种新的算法,旨在为大规模数据科学应用而设计的高效算法。初步研究证明了这种方法的前景,PI建议形式化地描述算法的行为,执行广泛的经验评估,并将受此建议启发的想法纳入即将教授的在线课程PI中。协作过滤,特别是矩阵分解,是设计推荐系统的一种广泛使用的方法。然而,这些模型的大小随着用户和项目的数量线性增长,并且由于其高昂的通信成本,领先的矩阵分解方法在分布式环境中带来了巨大的挑战。PI建议研究一种新的模型--为ApacheSpark设计的并行算法,该算法利用底层数据的稀疏性来极大地减少这种通信负担。初步研究证明了这种方法的前景,PI建议对算法的行为进行形式化描述,执行广泛的经验评估,并为其他学习环境探索Spark中更一般的模型并行性范例。PI还将把受此提议启发的与模型并行性相关的想法纳入即将在edX平台上教授的MOOC中。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Ameet Talwalkar其他文献

AutoML Decathlon: Diverse Tasks, Modern Methods, and Efficiency at Scale
AutoML Decathlon:多样化的任务、现代方法和大规模效率
  • DOI:
  • 发表时间:
    2021
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Nicholas Roberts;Samuel Guo;Cong Xu;Ameet Talwalkar;David Lander;Lvfang Tao;Linhang Cai;Shuaicheng Niu;Jianyu Heng;Hongyang Qin;Minwen Deng;Johannes Hog;Alexander Pfefferle;Sushil Ammanaghatta Shivakumar;Arjun Krishnakumar;Yubo Wang;R. Sukthanker;Frank Hutter;Euxhen Hasanaj;Tien;M. Khodak;Yuriy Nevmyvaka;Kashif Rasul;Frederic Sala;Anderson Schneider;Junhong Shen;Evan R. Sparks
  • 通讯作者:
    Evan R. Sparks
NAS-Bench-360: Benchmarking Diverse Tasks for Neural Architecture Search
NAS-Bench-360:神经架构搜索的各种任务基准测试
  • DOI:
  • 发表时间:
    2021
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Renbo Tu;M. Khodak;Nicholas Roberts;Ameet Talwalkar
  • 通讯作者:
    Ameet Talwalkar
On the support recovery of marginal regression.
关于边际回归的支持恢复。
  • DOI:
  • 发表时间:
    2019
  • 期刊:
  • 影响因子:
    0
  • 作者:
    S. J. Kazemitabar;A. Amini;Ameet Talwalkar
  • 通讯作者:
    Ameet Talwalkar
Applying interpretable machine learning in computational biology—pitfalls, recommendations and opportunities for new developments
在计算生物学中应用可解释机器学习——陷阱、建议和新发展的机会
  • DOI:
    10.1038/s41592-024-02359-7
  • 发表时间:
    2024-08-09
  • 期刊:
  • 影响因子:
    32.100
  • 作者:
    Valerie Chen;Muyu Yang;Wenbo Cui;Joon Sik Kim;Ameet Talwalkar;Jian Ma
  • 通讯作者:
    Jian Ma
Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining
第 27 届 ACM SIGKDD 知识发现会议论文集
  • DOI:
  • 发表时间:
    2021
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Jeffrey Li;Vaishnavh Nagarajan;Gregory Plumb;Ameet Talwalkar
  • 通讯作者:
    Ameet Talwalkar

Ameet Talwalkar的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Ameet Talwalkar', 18)}}的其他基金

Travel: NSF Student Travel Grant for the Sixth Conference on Machine Learning and Systems (MLSys 2023)
旅行:第六届机器学习和系统会议 (MLSys 2023) 的 NSF 学生旅行补助金
  • 批准号:
    2325547
  • 财政年份:
    2023
  • 资助金额:
    $ 6.88万
  • 项目类别:
    Standard Grant
CAREER: Foundations of Next-Generation Neural Architecture Search
职业:下一代神经架构搜索的基础
  • 批准号:
    2046613
  • 财政年份:
    2021
  • 资助金额:
    $ 6.88万
  • 项目类别:
    Continuing Grant
BIGDATA: F: Optimization in Federated Networks of Devices
BIGDATA:F:设备联合网络的优化
  • 批准号:
    1838017
  • 财政年份:
    2019
  • 资助金额:
    $ 6.88万
  • 项目类别:
    Standard Grant
SIFTER: A Systems Biology Platform for Protein Function Prediction
SIFTER:蛋白质功能预测的系统生物学平台
  • 批准号:
    1122732
  • 财政年份:
    2011
  • 资助金额:
    $ 6.88万
  • 项目类别:
    Fellowship Award

相似国自然基金

强流低能加速器束流损失机理的Parallel PIC/MCC算法与实现
  • 批准号:
    11805229
  • 批准年份:
    2018
  • 资助金额:
    27.0 万元
  • 项目类别:
    青年科学基金项目

相似海外基金

Collaborative Research: CyberTraining: Implementation:Medium: Modern Course Exemplars infused with Parallel and Distributed Computing for the Introductory Computing Course Sequence
协作研究:网络培训:实施:中:为入门计算课程序列注入并行和分布式计算的现代课程范例
  • 批准号:
    2321017
  • 财政年份:
    2023
  • 资助金额:
    $ 6.88万
  • 项目类别:
    Standard Grant
Collaborative Research:CyberTraining:Implementation:Medium: Modern Course Exemplars infused with Parallel and Distributed Computing for the Introductory Computing Course Sequence
协作研究:网络培训:实施:中:为入门计算课程序列注入并行和分布式计算的现代课程范例
  • 批准号:
    2321020
  • 财政年份:
    2023
  • 资助金额:
    $ 6.88万
  • 项目类别:
    Standard Grant
ExpandQISE: Track 1: Collaborative Optimization and Management for Iterative and Parallel Quantum Computing
ExpandQISE:轨道 1:迭代和并行量子计算的协作优化和管理
  • 批准号:
    2329020
  • 财政年份:
    2023
  • 资助金额:
    $ 6.88万
  • 项目类别:
    Standard Grant
Collaborative Research:CyberTraining:Implementation:Medium: Modern Course Exemplars infused with Parallel and Distributed Computing for the Introductory Computing Course Sequence
协作研究:网络培训:实施:中:为入门计算课程序列注入并行和分布式计算的现代课程范例
  • 批准号:
    2321016
  • 财政年份:
    2023
  • 资助金额:
    $ 6.88万
  • 项目类别:
    Standard Grant
Collaborative Research:CyberTraining:Implementation:Medium: Modern Course Exemplars infused with Parallel and Distributed Computing for the Introductory Computing Course Sequence
协作研究:网络培训:实施:中:为入门计算课程序列注入并行和分布式计算的现代课程范例
  • 批准号:
    2321019
  • 财政年份:
    2023
  • 资助金额:
    $ 6.88万
  • 项目类别:
    Standard Grant
Collaborative Research: Ideas Lab: Discovery of Novel Functional RNA Classes by Computational Integration of Massively-Parallel RBP Binding and Structure Data
合作研究:创意实验室:通过大规模并行 RBP 结合和结构数据的计算集成发现新的功能性 RNA 类别
  • 批准号:
    2243706
  • 财政年份:
    2023
  • 资助金额:
    $ 6.88万
  • 项目类别:
    Standard Grant
Collaborative Research: FMitF: Track I: Automating and Synthesizing Parallel Zero-Knowledge Protocols
合作研究:FMitF:第一轨:自动化和综合并行零知识协议
  • 批准号:
    2318975
  • 财政年份:
    2023
  • 资助金额:
    $ 6.88万
  • 项目类别:
    Standard Grant
Collaborative Research: FMitF: Track I: Automating and Synthesizing Parallel Zero-Knowledge Protocols
合作研究:FMitF:第一轨:自动化和综合并行零知识协议
  • 批准号:
    2318974
  • 财政年份:
    2023
  • 资助金额:
    $ 6.88万
  • 项目类别:
    Standard Grant
Collaborative Research: Ideas Lab: Discovery of Novel Functional RNA Classes by Computational Integration of Massively-Parallel RBP Binding and Structure Data
合作研究:创意实验室:通过大规模并行 RBP 结合和结构数据的计算集成发现新的功能性 RNA 类别
  • 批准号:
    2243704
  • 财政年份:
    2023
  • 资助金额:
    $ 6.88万
  • 项目类别:
    Standard Grant
Collaborative Research: CyberTraining:Implementation:Medium: Modern Course Exemplars infused with Parallel and Distributed Computing for the Introductory Computing Course Sequence
协作研究:网络培训:实施:中:为入门计算课程序列注入并行和分布式计算的现代课程范例
  • 批准号:
    2321015
  • 财政年份:
    2023
  • 资助金额:
    $ 6.88万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了