SGER: Scaling up unsupervised grammar induction
SGER:扩大无监督语法归纳
基本信息
- 批准号:0836431
- 负责人:
- 金额:--
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2008
- 资助国家:美国
- 起止时间:2008-07-01 至 2009-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
This SGER project seeks to determine the scalability of computationally intensive, iterative statistical learning algorithms on a MapReduce architecture. Such algorithms underlie much research in natural language processing, yet their scalability to even moderately large training datasets (text corpora) has been under-explored. On the surface, scaling to more data appears to be a good fit for the MapReduce paradigm, and this exploratory project aims to identify whether such algorithms benefit from more data and more complex data than used in prior work. A special emphasis is given to unsupervised learning algorithms, such as the Expectation-Maximization algorithm, which have been widely studied on small problems and rarely studied on large ones. The technique is applicable to many other methods, as well.At the same time, the project seeks to explore how to leverage supercomputers and MapReduce to make these learning algorithms faster, permitting a faster research cycle. Concretely, the "E step" (or itsanalogue) is the most computationally demanding part of an iteration, but the standard assumption that the training data are independently and identically distributed permits parallelization. To the extent that this parallelization is affected by network and input-output overhead, each iteration of training may be made faster, perhaps reducing training time from days or weeks to hours. This project explores this tradeoff and others like it.This work leverages a resource donated by Yahoo for use by the PI's research group: a 4,000-node supercomputer running Hadoop (an open-source implementation of MapReduce).
这个SGER项目旨在确定MapReduce架构上计算密集型迭代统计学习算法的可扩展性。 这些算法是自然语言处理中许多研究的基础,但它们对中等规模的训练数据集(文本语料库)的可扩展性尚未得到充分探索。 从表面上看,扩展到更多的数据似乎很适合MapReduce范式,这个探索性的项目旨在确定这些算法是否受益于比以前工作中使用的更多数据和更复杂的数据。 特别强调的是无监督学习算法,如期望最大化算法,这已经被广泛研究的小问题,很少研究大的。 该技术也适用于许多其他方法。同时,该项目旨在探索如何利用超级计算机和MapReduce来使这些学习算法更快,从而加快研究周期。 具体地说,“E步骤”(或其sanitary)是迭代中计算要求最高的部分,但训练数据独立且同分布的标准假设允许并行化。 在这种并行化受到网络和输入输出开销影响的程度上,可以使训练的每次迭代更快,可能将训练时间从几天或几周减少到几小时。 这个项目探索了这种权衡和其他类似的权衡。这项工作利用了雅虎捐赠的资源,供PI的研究小组使用:一台运行Hadoop(MapReduce的开源实现)的4,000个节点的超级计算机。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Noah Smith其他文献
Buying health: assessing the impact of a consumer-side vegetable subsidy on purchasing, consumption and waste
购买健康:评估消费者侧蔬菜补贴对购买、消费和浪费的影响
- DOI:
- 发表时间:
2015 - 期刊:
- 影响因子:3.2
- 作者:
Noah Smith - 通讯作者:
Noah Smith
Implications for cumulative and prolonged clinical improvement induced by cross-linked hyaluronic acid: An in vivo biochemical/microscopic study in humans.
交联透明质酸诱导的累积和长期临床改善的影响:人类体内生化/显微镜研究。
- DOI:
10.1111/exd.14998 - 发表时间:
2024 - 期刊:
- 影响因子:3.6
- 作者:
Frank Wang;T. Do;Noah Smith;J. Orringer;Sewon Kang;John J Voorhees;Gary J. Fisher - 通讯作者:
Gary J. Fisher
THE NORTH ATLANTIC TREATY ORGANIZATION AND UNITED STATES RELATIONSHIP: A STUDY OF ITS DEVELOPMENT AND POSSIBLE FUTURE
北大西洋公约组织与美国的关系:对其发展和可能的未来的研究
- DOI:
- 发表时间:
2015 - 期刊:
- 影响因子:0
- 作者:
Noah Smith - 通讯作者:
Noah Smith
Constructions of locally recoverable codes with large availability
- DOI:
10.1007/s10623-025-01624-w - 发表时间:
2025-04-05 - 期刊:
- 影响因子:1.200
- 作者:
Giacomo Micheli;Vincenzo Pallozzi Lavorante;Abhi Shukul;Noah Smith - 通讯作者:
Noah Smith
Biopsy of Suspected Melanoma
疑似黑色素瘤活检
- DOI:
10.1007/978-3-319-46029-1_10-1 - 发表时间:
2018 - 期刊:
- 影响因子:0
- 作者:
Noah Smith;T. Johnson;J. Kelly;A. Sober;C. Bichakjian - 通讯作者:
C. Bichakjian
Noah Smith的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Noah Smith', 18)}}的其他基金
NSF-BSF: RI: Small: Efficient Transformers via Formal and Empirical Analysis
NSF-BSF:RI:小型:通过形式和经验分析的高效变压器
- 批准号:
2113530 - 财政年份:2021
- 资助金额:
-- - 项目类别:
Standard Grant
RI/SES: Conference Proposal: Doctoral Consortium on Text as Data
RI/SES:会议提案:文本即数据博士联盟
- 批准号:
1830158 - 财政年份:2018
- 资助金额:
-- - 项目类别:
Standard Grant
NSF-BSF: RI: Small: Collaborative Research: Modeling Crosslinguistic Influences Between Language Varieties
NSF-BSF:RI:小型:协作研究:模拟语言品种之间的跨语言影响
- 批准号:
1813153 - 财政年份:2018
- 资助金额:
-- - 项目类别:
Continuing Grant
RI: Medium: Broad-Coverage Semantic Parsing: Linguistic Representation Learning from Crowd-Scale Data
RI:中:广泛覆盖的语义解析:从人群规模数据中学习语言表示
- 批准号:
1562364 - 财政年份:2016
- 资助金额:
-- - 项目类别:
Continuing Grant
Workshop: Support for a workshop on scientific research applications of natural language technologies
研讨会:支持自然语言技术科研应用研讨会
- 批准号:
1433108 - 财政年份:2014
- 资助金额:
-- - 项目类别:
Standard Grant
BIGDATA: Small: DA: Big Multilinguality for Data-Driven Lexical Semantics
BIGDATA:小:DA:数据驱动词汇语义的大多语言性
- 批准号:
1251131 - 财政年份:2013
- 资助金额:
-- - 项目类别:
Standard Grant
EAGER: PARTIAL: An Exploratory Study on Practical Approaches for Robust NLP Tools with Integrated Annotation Languages
EAGER: PARTIAL:关于具有集成注释语言的鲁棒 NLP 工具实用方法的探索性研究
- 批准号:
1352440 - 财政年份:2013
- 资助金额:
-- - 项目类别:
Standard Grant
SoCS: Collaborative Research: Data-Driven, Computational Models for Discovery and Analysis of Framing
SoCS:协作研究:用于发现和分析框架的数据驱动计算模型
- 批准号:
1211277 - 财政年份:2012
- 资助金额:
-- - 项目类别:
Standard Grant
CAREER: Flexible Learning for Natural Language Processing
职业:自然语言处理的灵活学习
- 批准号:
1054319 - 财政年份:2011
- 资助金额:
-- - 项目类别:
Continuing Grant
RI-Small: Probabilistic Models for Structure Discovery in Text
RI-Small:文本结构发现的概率模型
- 批准号:
0915187 - 财政年份:2009
- 资助金额:
-- - 项目类别:
Continuing Grant
相似海外基金
Scaling-Up plant based Nanocarriers for BIOpharmaceuticals (SUNBIO)
用于生物制药的植物纳米载体的放大(SUNBIO)
- 批准号:
EP/Z53304X/1 - 财政年份:2024
- 资助金额:
-- - 项目类别:
Research Grant
Scaling-up co-designed adolescent mental health interventions
扩大共同设计的青少年心理健康干预措施
- 批准号:
MR/Y020286/1 - 财政年份:2024
- 资助金额:
-- - 项目类别:
Fellowship
Scaling up plant-protein based coatings for food packaging
扩大用于食品包装的植物蛋白基涂料
- 批准号:
10109386 - 财政年份:2024
- 资助金额:
-- - 项目类别:
Launchpad
Scaling Up Point Of Care Testing And Linkages To Care For Syphilis And HIV In Rural, Remote, And Indigenous Populations In Central Alberta
扩大艾伯塔省中部农村、偏远地区和原住民的护理点检测和联系,以治疗梅毒和艾滋病毒
- 批准号:
502790 - 财政年份:2024
- 资助金额:
-- - 项目类别:
Directed Grant
URBAN RETROFIT UK: Scaling up place-based adaptations to the built environment through planning and development systems
英国城市改造:通过规划和开发系统扩大对建筑环境的基于地点的适应
- 批准号:
ES/Z502728/1 - 财政年份:2024
- 资助金额:
-- - 项目类别:
Research Grant
Postdoctoral Fellowship: OCE-PRF: Scaling up herbivore holobiont physiology from genes to populations across a temperate upwelling gradient
博士后奖学金:OCE-PRF:跨温带上升流梯度将食草动物全生物生理学从基因扩展到种群
- 批准号:
2308398 - 财政年份:2024
- 资助金额:
-- - 项目类别:
Standard Grant
Scaling Up our Well-bean Machine
扩大我们的优质豆机规模
- 批准号:
10053959 - 财政年份:2023
- 资助金额:
-- - 项目类别:
Small Business Research Initiative
Scaling up a novel low-emission fungal fermentation-based production system to commercialise ultra-realistic meat whole-cuts alternatives
扩大基于真菌发酵的新型低排放生产系统,以实现超现实肉类全切替代品的商业化
- 批准号:
10076671 - 财政年份:2023
- 资助金额:
-- - 项目类别:
Collaborative R&D
Scaling up Treekind(R) - a truly sustainable vegan leather alternative, completely free of plastic polyurethane
Scaling up Treekind(R) - 真正可持续的纯素皮革替代品,完全不含塑料聚氨酯
- 批准号:
10081776 - 财政年份:2023
- 资助金额:
-- - 项目类别:
Collaborative R&D
OTTERS - Social Transformation for Water Stewardship through Scaling Up Citizen Science
OTTERS - 通过扩大公民科学来实现水资源管理的社会转型
- 批准号:
10069021 - 财政年份:2023
- 资助金额:
-- - 项目类别:
EU-Funded