CRI: CI-SUSTAIN: Collaborative Research: Sustaining Lemur Project Resources for the Long-Term
CRI:CI-SUSTAIN:合作研究:长期维持狐猴项目资源
基本信息
- 批准号:1822986
- 负责人:
- 金额:$ 37.67万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2018
- 资助国家:美国
- 起止时间:2018-09-01 至 2023-08-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
For more than a decade, the software, datasets, and online services developed and provided by the Lemur Project have supported and enabled a large body of academic and commercial research on search engines, information retrieval, and other areas of computer science that analyze and process human language. This project makes critical enhancements to Lemur Project infrastructure, operates the infrastructure for another three years, and positions it for long-term sustainability. As part of the enhancements, the Galago search engine is enhanced to provide stronger integration of neural networks and other machine learning methods. A new dataset, ClueWeb2020, is developed to replace the widely-used ClueWeb09 and ClueWeb12 datasets. These investments will support advanced research for the next decade. The advanced search capabilities developed for the project's open-source Indri and Galago search engines, which are widely used for research, are added to the open-source Lucene search engine, which is widely used by industry. New software applications are developed to simplify migration between Lemur Project search engines and Lucene. These investments improve the state-of-the-art of software important to industry and enable researchers to migrate research to more widely-used software. The Lemur Project's research infrastructure attracted a substantial research user community because it easily enables leading-edge research. These enhancements enable researchers in information retrieval and related areas to carry out a much broader range of experiments and to share their results. Research and industry development supported by the new Lemur Project software will create a new generation of more capable search engines for a variety of tasks.The project is organized around three types of activities: Sustaining software, sustaining datasets, and operation. The project achieves long-term software sustainability by adding support for Indri and Galago functionality and creating integration and migration paths with the open-source Lucene search engine, which has large user and volunteer-developer communities. Research done with Galago or Indri will thus be reproducible in Lucene and more accessible to Lucene's industry users. The project also extends the Galago Application Programming Interface to support the newest developments in neural network (deep learning) document ranking technologies, which now are being studied widely and expected in a state-of-the-art research system. It broadens the utility of Ranklib by supporting neural algorithms for better comparison with high quality learning to rank approaches, and broadens the utility of the Sifaka text mining application with support for additional document and machine learning formats. The older ClueWeb09 and ClueWeb12 datsets are superseded by a new ClueWeb2020 dataset that is designed to last a decade and support research on newer learning-to-rank and neural network (deep learning) ranking algorithms. The project maintains and operates the existing infrastructure, in the form of software maintenance and support; dataset licensing and distribution; and operation of online search services. The new Lemur Project infrastructure supports a broad range of Information Retrieval research, for example, research on retrieval models; how to train learned rankers; use of semi-structured knowledge bases; result diversification; query optimization; and distributed search. In particular, it greatly improves support for research on learned and neural (deep learning) ranking algorithms, which have become important research topics in recent years. The ClueWeb datasets are used by a broad human language technologies research community. This project makes enhancements that sustain this infrastructure for the research community for at least the next decade.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
十多年来,Lemur Project开发和提供的软件、数据集和在线服务支持和支持了大量关于搜索引擎、信息检索和其他计算机科学领域的学术和商业研究,这些领域分析和处理人类语言。该项目对狐猴项目的基础设施进行了重大改进,基础设施将再运行三年,并为长期可持续发展做好准备。作为改进的一部分,Galago搜索引擎得到了增强,以便更好地整合神经网络和其他机器学习方法。开发了一个新的数据集ClueWeb2020,以取代广泛使用的ClueWeb09和ClueWeb12数据集。这些投资将为未来十年的高级研究提供支持。为该项目广泛用于研究的开源Indri和Galago搜索引擎开发的高级搜索功能被添加到业界广泛使用的开源Lucene搜索引擎中。开发了新的软件应用程序,以简化Lemur Project搜索引擎和Lucene之间的迁移。这些投资改善了对工业重要的软件的最新水平,并使研究人员能够将研究转移到更广泛使用的软件上。狐猴项目的研究基础设施吸引了大量研究用户,因为它很容易实现前沿研究。这些改进使信息检索和相关领域的研究人员能够进行更广泛的实验,并分享他们的结果。在新的Lemur Project软件的支持下,研究和行业开发将为各种任务创造更强大的新一代搜索引擎。该项目围绕三种类型的活动组织:支持软件、支持数据集和运营。该项目通过增加对Indri和Galago功能的支持,以及创建与开源Lucene搜索引擎的集成和迁移路径来实现软件的长期可持续性,Lucene搜索引擎拥有大量的用户和志愿者开发人员社区。因此,用Galago或Indri完成的研究将在Lucene中重现,并且更容易为Lucene的行业用户所访问。该项目还扩展了Galago应用程序编程接口,以支持神经网络(深度学习)文档排序技术的最新发展,这些技术现在正在进行广泛研究,并有望在最先进的研究系统中使用。它通过支持神经算法来扩大Ranklib的用途,以便更好地与高质量的学习进行比较来对方法进行排名,并通过支持其他文档和机器学习格式来扩大Sifaka文本挖掘应用程序的用途。旧的ClueWeb09和ClueWeb12数据集被新的ClueWeb2020数据集取代,该数据集旨在持续十年,并支持对较新的学习排名和神经网络(深度学习)排名算法的研究。该项目维护和运营现有的基础设施,包括软件维护和支持、数据集许可证发放和分发以及在线搜索服务的运营。新的Lemur项目基础设施支持广泛的信息检索研究,例如,关于检索模型的研究;如何培训学习排名者;使用半结构化知识库;结果多样化;查询优化;以及分布式搜索。特别是,它极大地提高了对学习和神经(深度学习)排名算法研究的支持,这两种算法近年来已成为重要的研究课题。ClueWeb数据集被广泛的人类语言技术研究社区使用。该项目进行了增强,至少在未来十年内为研究社区维持这一基础设施。该奖项反映了NSF的法定使命,并通过使用基金会的智力优势和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
James Allan其他文献
A Single Nucleotide Resolution Model for Large-Scale Simulations of Double Stranded DNA
用于大规模模拟双链 DNA 的单核苷酸分辨率模型
- DOI:
10.1101/069310 - 发表时间:
2016 - 期刊:
- 影响因子:0
- 作者:
Y. G. Fosado;D. Michieletto;James Allan;C. Brackley;O. Henrich;D. Marenduzzo - 通讯作者:
D. Marenduzzo
Introduction to topic detection and tracking
- DOI:
10.1007/978-1-4615-0933-2_1 - 发表时间:
2002 - 期刊:
- 影响因子:0
- 作者:
James Allan - 通讯作者:
James Allan
Using CrowdLogger for in situ information retrieval system evaluation
使用CrowdLogger进行现场信息检索系统评估
- DOI:
10.1145/2513150.2513164 - 发表时间:
2013 - 期刊:
- 影响因子:0
- 作者:
H. Feild;James Allan - 通讯作者:
James Allan
A semantic data framework to support data-driven demand forecasting
支持数据驱动的需求预测的语义数据框架
- DOI:
10.1088/1742-6596/2600/2/022001 - 发表时间:
2023 - 期刊:
- 影响因子:0
- 作者:
James Allan;Francesca Mangili;Marco Derboni;Luis Gisler;A. Hainoun;A. Rizzoli;Luca Ventriglia;M. Sulzer - 通讯作者:
M. Sulzer
Edinburgh Research Explorer Supercoiling in Dna and Chromatin Supercoiling in Dna and Chromatin § This Review Comes from a Themed Issue on Genome Architecture and Expression Introduction
爱丁堡研究探索者 DNA 和染色质中的超螺旋 DNA 和染色质中的超螺旋 § 这篇评论来自基因组结构和表达简介的主题问题
- DOI:
- 发表时间:
- 期刊:
- 影响因子:0
- 作者:
Nick Gilbert;James Allan;Allan;James - 通讯作者:
James
James Allan的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('James Allan', 18)}}的其他基金
CondensabLe AeRosol from non Ideal Stove Emissions (CLARISE)
非理想炉排放产生的冷凝气溶胶 (CLARISE)
- 批准号:
NE/X000923/1 - 财政年份:2023
- 资助金额:
$ 37.67万 - 项目类别:
Research Grant
III: Medium: Collaborative Research: Athena: Learning-oriented Search with Personalized Learning Flows
III:媒介:协作研究:Athena:具有个性化学习流程的面向学习的搜索
- 批准号:
2106282 - 财政年份:2021
- 资助金额:
$ 37.67万 - 项目类别:
Continuing Grant
EAGER: Dynamic Contextual Explanation of Search Results
EAGER:搜索结果的动态上下文解释
- 批准号:
2039449 - 财政年份:2020
- 资助金额:
$ 37.67万 - 项目类别:
Standard Grant
Soot Aerodynamic Size Selection for Optical properties (SASSO)
光学特性烟灰空气动力学尺寸选择 (SASSO)
- 批准号:
NE/S00212X/1 - 财政年份:2018
- 资助金额:
$ 37.67万 - 项目类别:
Research Grant
III: Small: Mirador: Explainable Computational Models for Recognizing and Understanding Controversial Topics Encountered Online
III:小:Mirador:用于识别和理解网上遇到的有争议话题的可解释计算模型
- 批准号:
1813662 - 财政年份:2018
- 资助金额:
$ 37.67万 - 项目类别:
Standard Grant
I-Corps: Probabilistically Detecting Controversy
I-Corps:概率性检测争议
- 批准号:
1721069 - 财政年份:2017
- 资助金额:
$ 37.67万 - 项目类别:
Standard Grant
Megacity Delhi atmospheric emission quantification, assessment and impacts (DelhiFlux) - Manchester
大城市德里大气排放量化、评估和影响 (DelhiFlux) - 曼彻斯特
- 批准号:
NE/P016472/1 - 财政年份:2016
- 资助金额:
$ 37.67万 - 项目类别:
Research Grant
Sources and Emissions of Air Pollutants in Beijing (Manchester)
北京(曼彻斯特)空气污染物来源及排放
- 批准号:
NE/N007123/1 - 财政年份:2016
- 资助金额:
$ 37.67万 - 项目类别:
Research Grant
III: Small: Interactive Construction of Complex Query Models
III:小:复杂查询模型的交互构建
- 批准号:
1617408 - 财政年份:2016
- 资助金额:
$ 37.67万 - 项目类别:
Standard Grant
III: Small: Topical Positioning System (TPS) for Informed Reading of Web Pages
III:小:网页知情阅读的主题定位系统(TPS)
- 批准号:
1217281 - 财政年份:2012
- 资助金额:
$ 37.67万 - 项目类别:
Standard Grant
相似国自然基金
醒脑静多靶点调控PI3K/Akt通路抑制CI/RI氧化应激—基于网络药理学及体内、外实验研究
- 批准号:2025JJ90117
- 批准年份:2025
- 资助金额:0.0 万元
- 项目类别:省市级项目
基于“免疫-神经”网络探讨眼针活化CI/RI大鼠MC靶向H3R调节“免疫监视”的抗炎机制
- 批准号:82374375
- 批准年份:2023
- 资助金额:51 万元
- 项目类别:面上项目
ci-Eln促进亲本基因Eln介导的缺氧肺动脉平滑肌细胞增殖的机制研究
- 批准号:
- 批准年份:2021
- 资助金额:30 万元
- 项目类别:青年科学基金项目
通过单细胞转录组测序揭示Wolbachia诱导果蝇CI的分子机制
- 批准号:32170497
- 批准年份:2021
- 资助金额:58 万元
- 项目类别:面上项目
森林垂直分层LAI和CI时空变异特征、LiDAR遥感反演与验证研究
- 批准号:
- 批准年份:2021
- 资助金额:59 万元
- 项目类别:面上项目
CI 994对SLC25A46相关线粒体病的治疗及机制研究
- 批准号:82001449
- 批准年份:2020
- 资助金额:24.0 万元
- 项目类别:青年科学基金项目
近邻星系中[CI]线作为新分子气体质量探针的观测研究
- 批准号:12003070
- 批准年份:2020
- 资助金额:24.0 万元
- 项目类别:青年科学基金项目
lncRNA343/miR-509-3p/STC1轴在CI-AKI肾小管上皮细胞线粒体质量控制失衡中的作用与机制
- 批准号:81873607
- 批准年份:2018
- 资助金额:57.0 万元
- 项目类别:面上项目
α2肾上腺素受体活化促ESCRT-III膜聚集在肾CI/RI致肺程序性坏死中的机制研究
- 批准号:81801900
- 批准年份:2018
- 资助金额:21.0 万元
- 项目类别:青年科学基金项目
内共生菌引起棉叶螨的细胞质不亲和(CI)的分子机理研究
- 批准号:31860508
- 批准年份:2018
- 资助金额:39.0 万元
- 项目类别:地区科学基金项目
相似海外基金
CRI: CI-SUSTAIN: Racket on Alternative Platforms
CRI:CI-SUSTAIN:替代平台上的喧嚣
- 批准号:
1823244 - 财政年份:2018
- 资助金额:
$ 37.67万 - 项目类别:
Continuing Grant
CRI: CI-SUSTAIN: Collaborative Research: CiteSeerX: Toward Sustainable Support of Scholarly Big Data
CRI:CI-SUSTAIN:协作研究:CiteSeerX:迈向学术大数据的可持续支持
- 批准号:
1823288 - 财政年份:2018
- 资助金额:
$ 37.67万 - 项目类别:
Standard Grant
CRI: CI-SUSTAIN: Collaborative Research: CiteSeerX: Toward Sustainable Support of Scholarly Big Data
CRI:CI-SUSTAIN:协作研究:CiteSeerX:迈向学术大数据的可持续支持
- 批准号:
1853919 - 财政年份:2018
- 资助金额:
$ 37.67万 - 项目类别:
Standard Grant
CRI: CI-SUSTAIN: Collaborative Research: CiteSeerX: Toward Sustainable Support of Scholarly Big Data
CRI:CI-SUSTAIN:协作研究:CiteSeerX:迈向学术大数据的可持续支持
- 批准号:
1823292 - 财政年份:2018
- 资助金额:
$ 37.67万 - 项目类别:
Standard Grant
CRI: CI-SUSTAIN: Collaborative Research: Sustaining Lemur Project Resources for the Long-Term
CRI:CI-SUSTAIN:合作研究:长期维持狐猴项目资源
- 批准号:
1822975 - 财政年份:2018
- 资助金额:
$ 37.67万 - 项目类别:
Standard Grant
Collaborative Research: CI-SUSTAIN: StarExec: Cross-Community Infrastructure for Logic Solving
协作研究:CI-SUSTAIN:StarExec:用于逻辑解决的跨社区基础设施
- 批准号:
1730419 - 财政年份:2017
- 资助金额:
$ 37.67万 - 项目类别:
Standard Grant
CI-SUSTAIN: Stan for the Long Run
CI-SUSTAIN:长远发展
- 批准号:
1730414 - 财政年份:2017
- 资助金额:
$ 37.67万 - 项目类别:
Standard Grant
CI-SUSTAIN: Sustainable Tools for Analysis and Research on Darknet Unsolicited Traffic (STARDUST).
CI-SUSTAIN:用于分析和研究暗网主动流量(STARDUST)的可持续工具。
- 批准号:
1730661 - 财政年份:2017
- 资助金额:
$ 37.67万 - 项目类别:
Standard Grant
Collaborative Research: CI-SUSTAIN: National File System Trace Repository
合作研究:CI-SUSTAIN:国家文件系统跟踪存储库
- 批准号:
1730726 - 财政年份:2017
- 资助金额:
$ 37.67万 - 项目类别:
Standard Grant
Collaborative Research: CI-SUSTAIN: National File System Trace Repository
合作研究:CI-SUSTAIN:国家文件系统跟踪存储库
- 批准号:
1729939 - 财政年份:2017
- 资助金额:
$ 37.67万 - 项目类别:
Standard Grant