Frameworks: arXiv as an accessible large-scale open research platform
框架:arXiv 作为一个可访问的大型开放研究平台
基本信息
- 批准号:2311521
- 负责人:
- 金额:$ 496.65万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2024
- 资助国家:美国
- 起止时间:2024-01-01 至 2028-12-31
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
arXiv is an open-access repository that has played a leading role in disciplines such as computer science, mathematics and physics for over 30 years. It hosts more than 2 million scientific papers and has a large user community. Each month there are approximately 5 million active users and 100 million web accesses. Despite its size and usage, arXiv has very limited search and recommendation functionality. In order to better serve the arXiv community, this project is building a new generation of search and recommendation functionality and simultaneously creating a research sandbox to reduce reliance on third-party, commercial services. To make arXiv's trove of scientific content accessible to the visually impaired, support is being added for well-structured HTML as well as PDF. Improved discovery of research results provides broad multidisciplinary benefits across areas of science. These include less researcher time wasted browsing through large amounts of irrelevant papers, revelation of "unknown unknowns," and accelerating research across different subject areas through unexpected synergies. Improved recommendation tools, which can provide unbiased and diverse sources of relevant research results and techniques, are urgently needed to break silos. arXiv will provide improved mechanisms for scientists to find out about important advances, both in their own field of expertise and in adjacent fields.This project includes 4 major focus areas: Open A/B Testing, Neural Representations of Scientific Text, arXiv Dynamics, and Security & Privacy. (1) Open A/B Testing enables arXiv to become a platform for A/B testing of search and recommendation algorithms. In addition to online A/B testing, offline A/B testing is provided using historical data along with counterfactual estimators for policy rewards. (2) Neural Representation of Scientific Text provides a vector-based representation of scientific texts (documents, paragraphs, and sentences) appropriate for multiple tasks, including citation, author, title, and keyword prediction. Differentiable search indices are investigated due to their potential to provide additional search performance improvements without requiring incremental re-training. Finally, this supports the construction of a scientific question-answering system which can also be used as a context-sensitive "chat-bot" enabling researchers to converse with and get a list of recent publications relevant to their interests. (3) The arXiv Dynamics project investigates how scientific fields grow, shrink, and transform over time. Creating a "trending and emerging arXiv topics" pattern recognition system predicts how interesting current and historical articles are to researchers. Research is investigating methods to remove the "rich-get-richer" effect from this model, to correct the model for the effects of the users' historical interactions with the system, and to track performance and solicit user feedback as these models change over time. (4) Under Security & Privacy arXiv's privacy policy is updated so that users are aware of how their (meta-)data may be used and the protections that will be deployed to protect their privacy. A "Layer 1" API allows researchers to make coarse-grained queries on anonymized arXiv weblogs and a "Layer 2" API which allows researchers to securely experiment on arXiv metadata and weblogs. Privacy is preserved by a combination of query restrictions and researcher usage agreements. A machine-learning API layer is being developed which supports differential privacy, and allows researchers to investigate the utility of these tools for novel ML-based applications, such as free-form question answering about scientific texts, neural recommender systems, etc.This award by the Office of Advanced Cyberinfrastructure is jointly supported by the Division of Information and Intelligent Systems in the Directorate for Computer and Information Science and Engineering and the Division of Physics within the Directorate for Mathematical and Physical Sciences.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
arXiv是一个开放存取的知识库,在计算机科学、数学和物理等学科中发挥了30多年的主导作用。它拥有200多万篇科学论文,并拥有庞大的用户社区。每个月大约有500万活跃用户和1亿次网络访问。尽管它的规模和用途,arXiv的搜索和推荐功能非常有限。为了更好地服务arXiv社区,该项目正在构建新一代的搜索和推荐功能,同时创建一个研究沙箱,以减少对第三方商业服务的依赖。为了让视障人士也能访问arXiv的科学内容宝库,它还增加了对结构良好的HTML和PDF的支持。改进的研究成果发现为跨科学领域提供了广泛的多学科效益。这包括减少研究人员浪费在浏览大量无关论文上的时间,揭示“未知的未知”,以及通过意想不到的协同作用加速不同学科领域的研究。迫切需要改进推荐工具,以提供公正和多样化的相关研究成果和技术来源,以打破孤岛。arXiv将为科学家提供改进的机制,使他们能够发现自己专业领域和邻近领域的重要进展。该项目包括4个主要关注领域:开放A/B测试、科学文本的神经表示、arXiv动力学和安全与隐私。(1)开放A/B测试使arXiv成为搜索和推荐算法的A/B测试平台。除了在线A/B测试之外,离线A/B测试还使用历史数据以及策略奖励的反事实估计器来提供。(2)科学文本的神经表示(Neural Representation of Scientific Text)为科学文本(文档、段落和句子)提供了一种基于向量的表示,适用于多种任务,包括引文、作者、标题和关键词预测。研究可微分搜索索引是因为它们有可能提供额外的搜索性能改进,而不需要增量重新训练。最后,这支持科学问答系统的构建,该系统也可以用作上下文敏感的“聊天机器人”,使研究人员能够与他们交谈并获得与他们感兴趣的最近出版物的列表。(3) arXiv Dynamics项目研究科学领域如何随着时间的推移而增长、缩小和转变。创建一个“趋势和新兴arXiv主题”模式识别系统,预测当前和历史文章对研究人员的兴趣程度。研究人员正在研究如何从该模型中去除“富者得富者”效应,根据用户与系统的历史交互影响对模型进行修正,并在这些模型随时间变化时跟踪性能并征求用户反馈。(4)根据安全与隐私arXiv的隐私政策进行更新,以便用户了解他们的(元)数据可能被如何使用,以及将部署哪些保护措施来保护他们的隐私。“第一层”API允许研究人员在匿名的arXiv博客上进行粗粒度查询,而“第二层”API允许研究人员在arXiv元数据和博客上进行安全实验。隐私通过查询限制和研究人员使用协议的组合来保护。一个机器学习API层正在开发中,它支持差分隐私,并允许研究人员研究这些工具在新颖的基于ml的应用程序中的效用,例如关于科学文本的自由形式问答,神经推荐系统,该奖项由先进网络基础设施办公室颁发,由计算机和信息科学与工程理事会的信息和智能系统司以及数学和物理科学理事会的物理司共同支持。该奖项反映了美国国家科学基金会的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Ramin Zabih其他文献
Ramin Zabih的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Ramin Zabih', 18)}}的其他基金
BIGDATA: F: DKA: Collaborative Research: Structured Nearest Neighbor Search in High Dimensions
BIGDATA:F:DKA:协作研究:高维结构化最近邻搜索
- 批准号:
1447473 - 财政年份:2015
- 资助金额:
$ 496.65万 - 项目类别:
Standard Grant
RI: Medium: Collaborative Research: Graph Cut Algorithms for Domain-specific Higher Order Priors
RI:中:协作研究:特定领域高阶先验的图割算法
- 批准号:
1161860 - 财政年份:2012
- 资助金额:
$ 496.65万 - 项目类别:
Continuing Grant
RI-Medium: Collaborative Research: Graph Cut Algorithms for Linear Inverse Systems
RI-Medium:协作研究:线性逆系统的图割算法
- 批准号:
0803705 - 财政年份:2008
- 资助金额:
$ 496.65万 - 项目类别:
Standard Grant
Dynamic Contextual Recognition of Moving Objects
移动物体的动态上下文识别
- 批准号:
9900115 - 财政年份:1999
- 资助金额:
$ 496.65万 - 项目类别:
Standard Grant
相似海外基金
Entwicklung eines Modells zur gemeinschaftlichen Finanzierung der Open Access-Plattform arXiv"
开放获取平台 arXiv 联合融资模型的开发”
- 批准号:
194934317 - 财政年份:
- 资助金额:
$ 496.65万 - 项目类别:
Science Communication, Research Data, eResearch (Scientific Library Services and Information Systems)














{{item.name}}会员




