CI-ADDO-EN: Flexible Machine Learning for Natural Language in the MALLET Toolkit

CI-ADDO-EN:MALLET 工具包中自然语言的灵活机器学习

基本信息

  • 批准号:
    0958392
  • 负责人:
  • 金额:
    $ 65万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Continuing Grant
  • 财政年份:
    2010
  • 资助国家:
    美国
  • 起止时间:
    2010-06-01 至 2016-05-31
  • 项目状态:
    已结题

项目摘要

Natural language processing, information extraction, informationintegration and other text processing solutions are central componentsof computer science, and key tools for addressing the ever-increasingproblems in information overload. Issues of information overload arenot only personal problems, but critical for business productivity,national defense, and increasingly government decision-making andtransparency.State-of-the-art natural language processing is increasingly based onmachine learning. However, the methodologies can be complex, andsoftware infrastructure necessary for such systems is generallydifficult to develop from scratch. To address this need we havecreated MALLET (MAchine Learning for LanguagE) and FACTORIE (Factorgraphs, Imperative, Extensible), open-source software toolkit that runin the Java virtual machine. They provide many modernstate-of-the-art machine learning methods, specially tuned to bescalable for the idiosyncrasies of natural language data, while alsoapplying well to many other discrete non- language tasks.The project will fill three critical gaps: (1) broadening thesetoolkits' applicability to new data and tasks (with better end-userinterfaces for labeling, training and diagnostics), (2) greatlyenhancing their research-support capabilities (with infrastructure forflexibly specifying model structures), and (3) improving theirunderstandability and support (with new documentation, examples,online community support).The project will have a direct positive impact on NLP and othermachine learning research, on teaching, and on collaborative researchactivities. Well-designed toolkits not only help researchers avoidduplicate implementation effort, but (a) they encourage sharing ofalgorithms and code, and thus also cultivate increased collaborationand intellectual flow of ideas; (b) they foster the communication ofdetailed clarity of algorithms and scientific reproducibility; (c)they help "level the playing field" by providing state-of-the-artimplementations of foundational building blocks and recent methods totop-tier and small institutions alike; (d) they supply a teachingtool, not only by making it easy for students to experiment with thesupplied research methodologies. Furthermore, by providing multipleready-to-use systems, non-programmers will have access to modern,scalable implementations of text processing tools that will spreadknowledge and use of these techniques across fields, to the socialsciences, humanities, and bio-medical fields.For further information see the project web site at the URL:http://www.cs.umass.edu/~mccallum/nsf-mallet
自然语言处理、信息抽取、信息集成和其他文本处理解决方案是计算机科学的核心组成部分,也是解决日益严重的信息过载问题的关键工具。 信息过载问题不仅是个人问题,而且对企业生产力、国防以及越来越多的政府决策和透明度至关重要。最先进的自然语言处理越来越多地基于机器学习。 然而,这些方法可能是复杂的,并且这些系统所需的软件基础设施通常很难从头开始开发。 为了满足这一需求,我们创建了MALLET(MACHINE LEARNING FOR EQUIPAGE)和FACTORIE(Factorgraphs,Imperative,Extensible),这是一个运行在Java虚拟机上的开源软件工具包。 他们提供了许多现代最先进的机器学习方法,特别针对自然语言数据的特性进行了调整,同时也适用于许多其他离散的非语言任务。(1)扩大这些工具包对新数据和任务的适用性(具有更好的终端用户界面,用于标签、培训和诊断),(2)大大增强其研究支持能力(提供灵活指定模型结构的基础设施),以及(3)提高它们的可理解性和支持(提供新的文档、示例、在线社区支持)。该项目将对NLP和其他机器学习研究、教学以及合作研究活动产生直接的积极影响。 设计良好的工具包不仅可以帮助研究人员避免重复的实现工作,而且(a)它们鼓励算法和代码的共享,从而也培养了更多的合作和思想的智力流动;(B)它们促进了算法的详细清晰度和科学可重复性的交流;(c)它们通过提供最先进的基本构件和最新的方法来帮助“创造公平的竞争环境”,(d)它们提供了一种教学工具,不仅使学生容易试验所提供的研究方法。 此外,通过提供多个即用型系统,非程序员将能够使用现代的、可扩展的文本处理工具,这些工具将在社会科学、人文科学和生物医学领域传播这些技术的知识和使用。http://www.cs.umass.edu/~mccallum/nsf-mallet

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Andrew McCallum其他文献

An Interoperable Multimedia Catalog System for Electronic Commerce.
用于电子商务的可互操作多媒体目录系统。
  • DOI:
  • 发表时间:
    2000
  • 期刊:
  • 影响因子:
    0
  • 作者:
    William W. Cohen;Andrew McCallum;D. Quass
  • 通讯作者:
    D. Quass
Scaling Within Document Coreference to Long Texts
文档共指内的缩放到长文本
  • DOI:
  • 发表时间:
    2021
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Raghuveer Thirukovalluru;Nicholas Monath;K. Shridhar;M. Zaheer;Mrinmaya Sachan;Andrew McCallum
  • 通讯作者:
    Andrew McCallum
ezCoref : A Scalable Approach for Collecting Crowdsourced Annotations for Coreference Resolution
ezCoref:一种收集众包注释以进行共指解析的可扩展方法
  • DOI:
  • 发表时间:
    2022
  • 期刊:
  • 影响因子:
    0
  • 作者:
    A. Crowdsourced;David Bamman;Olivia Lewke;Rachel Bawden;Rico Sennrich;Alexandra Birch;Ari Bornstein;Arie Cattan;Ido Dagan;Hong Chen;Zhenhua Fan;Hao Lu;Alan Yuille;Eduard Hovy;Mitch Marcus;M. Palmer;Lance;Rodney Huddleston. 2002;Frédéric Landragin;T. Poibeau;Bernard Vic;Belinda Z. Li;Gabriel Stanovsky;Robert L Logan;Andrew McCallum;Sameer Singh
  • 通讯作者:
    Sameer Singh
PaRaDe: Passage Ranking using Demonstrations with Large Language Models
PaRaDe:使用大型语言模型的演示进行段落排名
  • DOI:
    10.48550/arxiv.2310.14408
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Andrew Drozdov;Honglei Zhuang;Zhuyun Dai;Zhen Qin;Razieh Rahimi;Xuanhui Wang;Dana Alon;Mohit Iyyer;Andrew McCallum;Donald Metzler;Kai Hui
  • 通讯作者:
    Kai Hui
Every Answer Matters: Evaluating Commonsense with Probabilistic Measures
每个答案都很重要:用概率度量评估常识
  • DOI:
  • 发表时间:
    2024
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Qi Cheng;Michael Boratko;Pranay Kumar Yelugam;T. O’Gorman;Nalini Singh;Andrew McCallum;X. Li
  • 通讯作者:
    X. Li

Andrew McCallum的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Andrew McCallum', 18)}}的其他基金

Collaborative Research: SOS-DCI / HNDS-R: Advancing Semantic Network Analysis to Better Understand How Evaluative Exchanges Shape Scientific Arguments
合作研究:SOS-DCI / HNDS-R:推进语义网络分析,以更好地理解评估性交流如何塑造科学论证
  • 批准号:
    2244805
  • 财政年份:
    2023
  • 资助金额:
    $ 65万
  • 项目类别:
    Standard Grant
RI: Medium: Probabilistic Box Embeddings
RI:中:概率框嵌入
  • 批准号:
    2106391
  • 财政年份:
    2021
  • 资助金额:
    $ 65万
  • 项目类别:
    Standard Grant
DMREF: Collaborative Research: The Synthesis Genome: Data Mining for Synthesis of New Materials
DMREF:协作研究:合成基因组:新材料合成的数据挖掘
  • 批准号:
    1922090
  • 财政年份:
    2019
  • 资助金额:
    $ 65万
  • 项目类别:
    Standard Grant
RI: Medium: Extreme Clustering
RI:中:极端集群
  • 批准号:
    1763618
  • 财政年份:
    2018
  • 资助金额:
    $ 65万
  • 项目类别:
    Standard Grant
DMREF: Collaborative Research: The Synthesis Genome: Data Mining for Synthesis of New Materials
DMREF:协作研究:合成基因组:新材料合成的数据挖掘
  • 批准号:
    1534431
  • 财政年份:
    2015
  • 资助金额:
    $ 65万
  • 项目类别:
    Standard Grant
III: Medium: Constructing Knowledge Bases by Extracting Entity-Relations and Meanings from Natural Language via "Universal Schema"
III:媒介:通过“通用模式”从自然语言中提取实体关系和含义来构建知识库
  • 批准号:
    1514053
  • 财政年份:
    2015
  • 资助金额:
    $ 65万
  • 项目类别:
    Continuing Grant
The Fourth Northeast Student Colloquium on Artificial Intelligence
第四届东北学生人工智能学术研讨会
  • 批准号:
    1036017
  • 财政年份:
    2010
  • 资助金额:
    $ 65万
  • 项目类别:
    Standard Grant
RI-Medium: Collaborative Research: Dynamically-Structured Conditional Random Fields for Complex, Natural Domains
RI-Medium:协作研究:复杂自然域的动态结构条件随机场
  • 批准号:
    0803847
  • 财政年份:
    2008
  • 资助金额:
    $ 65万
  • 项目类别:
    Continuing Grant
CRI: Collaborative Research: Improving Experimental Computer Science with a Searchable Web Portal for Data Sets
CRI:协作研究:通过可搜索的数据集门户网站改进实验计算机科学
  • 批准号:
    0551597
  • 财政年份:
    2006
  • 资助金额:
    $ 65万
  • 项目类别:
    Continuing Grant
ITR: Collaborative Research: (ACS+NHS)-(dmc+soc): Machine Learning for Sequences and Structured Data: Tools for Non-Experts
ITR:协作研究:(ACS NHS)-(dmc soc):序列和结构化数据的机器学习:非专家工具
  • 批准号:
    0427594
  • 财政年份:
    2004
  • 资助金额:
    $ 65万
  • 项目类别:
    Standard Grant

相似海外基金

Collaborative Research: CI-ADDO-EN: Research Repository for Model-Driven Software Development (REMODD)
协作研究:CI-ADDO-EN:模型驱动软件开发研究存储库 (REMODD)
  • 批准号:
    1305381
  • 财政年份:
    2013
  • 资助金额:
    $ 65万
  • 项目类别:
    Standard Grant
Collaborative Research: CI-ADDO-EN: Making Internet Routing Data Accessible To All
合作研究:CI-ADDO-EN:让所有人都能访问互联网路由数据
  • 批准号:
    1305404
  • 财政年份:
    2013
  • 资助金额:
    $ 65万
  • 项目类别:
    Standard Grant
CI-ADDO-EN: Collaborative Research: Enhancing the srcML Infrastructure: A Mixed-Language Exploration, Analysis, and Manipulation Framework to Support Software Evolution
CI-ADDO-EN:协作研究:增强 srcML 基础设施:支持软件演进的混合语言探索、分析和操作框架
  • 批准号:
    1305292
  • 财政年份:
    2013
  • 资助金额:
    $ 65万
  • 项目类别:
    Standard Grant
Collaborative Research: CI-ADDO-EN: Making Internet Routing Data Accessible To All
合作研究:CI-ADDO-EN:让所有人都能访问互联网路由数据
  • 批准号:
    1305218
  • 财政年份:
    2013
  • 资助金额:
    $ 65万
  • 项目类别:
    Standard Grant
CI-ADDO-EN: Smart Home in a Box: Creating a Large Scale, Long Term Repository for Smart Environment Technologies
CI-ADDO-EN:盒子里的智能家居:为智能环境技术创建大规模、长期存储库
  • 批准号:
    1262814
  • 财政年份:
    2013
  • 资助金额:
    $ 65万
  • 项目类别:
    Standard Grant
CI-ADDO-EN: Infrastructure for the RF-Powered Computing Community
CI-ADDO-EN:射频驱动计算社区的基础设施
  • 批准号:
    1305072
  • 财政年份:
    2013
  • 资助金额:
    $ 65万
  • 项目类别:
    Standard Grant
CRI-CI-ADDO-EN: National File System Trace Repository
CRI-CI-ADDO-EN:国家文件系统跟踪存储库
  • 批准号:
    1305360
  • 财政年份:
    2013
  • 资助金额:
    $ 65万
  • 项目类别:
    Standard Grant
Collaborative Research: CI-ADDO-EN: Research Repository for Model-Driven Software Development (REMODD)
协作研究:CI-ADDO-EN:模型驱动软件开发研究存储库 (REMODD)
  • 批准号:
    1305358
  • 财政年份:
    2013
  • 资助金额:
    $ 65万
  • 项目类别:
    Standard Grant
CI-ADDO-EN: Collaborative Research: Enhancing the srcML Infrastructure: A Mixed-Language Exploration, Analysis, and Manipulation Framework to Support Software Evolution
CI-ADDO-EN:协作研究:增强 srcML 基础设施:支持软件演进的混合语言探索、分析和操作框架
  • 批准号:
    1305217
  • 财政年份:
    2013
  • 资助金额:
    $ 65万
  • 项目类别:
    Standard Grant
Collaborative Research: CI-ADDO-EN: Making Internet Routing Data Accessible To All
合作研究:CI-ADDO-EN:让所有人都能访问互联网路由数据
  • 批准号:
    1305346
  • 财政年份:
    2013
  • 资助金额:
    $ 65万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了