CRI: CI-SUSTAIN: Collaborative Research: CiteSeerX: Toward Sustainable Support of Scholarly Big Data
CRI:CI-SUSTAIN:协作研究:CiteSeerX:迈向学术大数据的可持续支持
基本信息
- 批准号:1823292
- 负责人:
- 金额:$ 23万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2018
- 资助国家:美国
- 起止时间:2018-08-01 至 2018-10-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Access to the scientific and scholarly literature has changed radically in recent decades. Increasingly researchers and scholars make their publications freely available on the Web. Taking advantage of this opportunity, new scientific search engine tools have been developed such as Google Scholar, Semantic Scholar, and CiteSeer, now CiteSeerX. CiteSeerX has become one of the most comprehensive and widely-used online public resources for the Computer and Information Science and Engineering (CISE) research community. Millions of CiteSeerX Portable Document Format (PDF) documents are indexed by Google. CiteSeerX is unique among digital library search engines. It is open access, most all of its documents are harvested from the public Web, and users have full-text access to all documents searchable on its website. Moreover, it provides all automatically extracted metadata and citation context via an Open Archive Initiative (OAI) metadata service interface and bulk downloads on a public cloud - all under a Creative Commons license. This service is usually not available from other scholarly search engines. CiteSeerX performs automatic extraction and indexing of tables (in production), figures (developed)}, and algorithms (developed), capabilities rarely seen in other scholarly search engines. CiteSeerX provides its open source software and architecture on GitHub. At this time none of the other above-mentioned systems release their digital library software. Utilizing the established CiteSeerX infrastructure, this proposal aims to create a sustainable CiteSeerX system with new data resources and a much larger data collection. We will develop a new system that runs with low operation overhead, without a single point of failure, and that provides quality and enriched data and metadata in portable formats that will be available through accessible user interfaces. We will ingest all freely accessible scientific documents on the Web, currently estimated to be 30 million. CiteSeerX will make available high-quality metadata through an accessible Web User Interface, Application Programming Interface, and data dumps. SeerSuite, the platform on which CiteSeerX is built, will be refactored so as to be an easily deployable and configurable scholarly digital library framework. It will be built on commercial grade open source software. In addition, we will provide searchable semantic metadata, such as key phrases and disambiguated author names, and non-textual content such as data from figures, tables, algorithms, and equations. For long-term sustainability we will explore different monetization models. The result will be a refactored digital library search engine that provides stable, usable, and reliable data services on multiple types of scientific documents built on a portable, maintainable, and self-contained framework that can be deployed for other research document digital collections. Source code will be hosted at https://github.com/SeerLabs. System development and related research will be published in relevant venues and be made publicly available.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
近几十年来,科学和学术文献的获取方式发生了根本变化。越来越多的研究人员和学者在网上免费提供他们的出版物。利用这个机会,新的科学搜索引擎工具已经开发出来,如Google Scholar,Semantic Scholar和CiteSeer,现在是CiteSeerX。CiteSeerX已成为计算机和信息科学与工程(CISE)研究社区最全面和最广泛使用的在线公共资源之一。数以百万计的CiteSeerX可移植文档格式(PDF)文档被Google编入索引。CiteSeerX是数字图书馆搜索引擎中独一无二的。它是开放访问的,其大部分文件都是从公共网络上获取的,用户可以全文访问其网站上可搜索的所有文件。此外,它通过开放档案倡议(OAI)元数据服务接口提供所有自动提取的元数据和引用上下文,并在公共云上批量下载-所有这些都在知识共享许可下。 此服务通常无法从其他学术搜索引擎获得。CiteSeerX执行自动提取和索引表(在生产),数字(开发),算法(开发),功能很少看到其他学术搜索引擎。CiteSeerX在GitHub上提供其开源软件和架构。目前,上述其他系统都没有发布其数字图书馆软件。利用现有的CiteSeerX基础设施,该提案旨在创建一个可持续的CiteSeerX系统,拥有新的数据资源和更大的数据收集。 我们将开发一个新的系统,该系统运行时的操作开销低,没有一个单一的故障点,并提供高质量和丰富的数据和元数据的便携式格式,将通过可访问的用户界面。我们将在网络上获取所有可免费访问的科学文件,目前估计有3000万份。CiteSeerX将通过可访问的Web用户界面、应用程序编程接口和数据转储提供高质量的元数据。SeerSuite是CiteSeerX构建的平台,将被重构,以便成为一个易于部署和配置的学术数字图书馆框架。它将建立在商业级开源软件上。 此外,我们将提供可搜索的语义元数据,如关键短语和消除歧义的作者姓名,以及非文本内容,如来自数字,表格,算法和方程的数据。为了长期可持续发展,我们将探索不同的货币化模式。其结果将是一个重构的数字图书馆搜索引擎,提供稳定,可用和可靠的数据服务,对多种类型的科学文件建立在一个便携式,可维护性和自包含的框架,可以部署为其他研究文件的数字收藏。源代码将托管在https://github.com/SeerLabs。该奖项反映了NSF的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Cornelia Caragea其他文献
Scientific Keyphrase Identification and Classification by Pre-Trained Language Models Intermediate Task Transfer Learning
通过预训练语言模型进行科学的关键词识别和分类中间任务迁移学习
- DOI:
- 发表时间:
2020 - 期刊:
- 影响因子:0
- 作者:
Seoyeon Park;Cornelia Caragea - 通讯作者:
Cornelia Caragea
Metadata Repository
元数据存储库
- DOI:
10.1007/978-0-387-39940-9_3058 - 发表时间:
2009 - 期刊:
- 影响因子:0
- 作者:
Cornelia Caragea;Vasant G Honavar;P. Boncz;P. Larson;S. Dietrich;Gonzalo Navarro;B. Thuraisingham;Yan Luo;Ouri E. Wolfson;S. Beitzel;Eric C. Jensen;O. Frieder;C. Jensen;N. Tradisauskas;E. Munson;A. Wun;K. Goda;Stephen E. Fienberg;Jiashun Jin;Guimei Liu;Nick Craswell;T. Pedersen;Cesare Pautasso;M. Moro;S. Manegold;B. Carminati;Marina Blanton;S. Bouchenak;Noël de Palma;Wei Tang;C. Quix;M. Jeusfeld;R. K. Pon;David J. Buttler;W. Meng;P. Zezula;Michal Batko;Vlastislav Dohnal;J. Domingo;Denilson Barbosa;I. Manolescu;Jeffrey Xu Yu;E. Cecchet;Vivien Quéma;Xifeng Yan;G. Santucci;D. Zeinalipour;Panos K. Chrysanthis;A. Deshpande;Carlos Guestrin;S. Madden;C. Leung;R. H. Güting;Amarnath Gupta;Heng Tao Shen;G. Weikum;Ramesh Jain;J. Yu;P. Ciaccia;K. Candan;M. Sapino;C. Meghini;F. Sebastiani;U. Straccia;F. Nack;V. S. Subrahmanian;Maria Vanina Martinez;D. Reforgiato;T. Westerveld;M. Sebillo;G. Vitiello;M. De Marsico;K. Voruganti;C. Parent;S. Spaccapietra;C. Vangenot;E. Zimányi;Prasan Roy;S. Sudarshan;E. Puppo;Peer Kröger;M. Renz;H. Schuldt;Solmaz Kolahi;A. Unwin;W. Cellary - 通讯作者:
W. Cellary
Semantic Tokenizer for Enhanced Natural Language Processing
用于增强自然语言处理的语义分词器
- DOI:
10.48550/arxiv.2304.12404 - 发表时间:
2023 - 期刊:
- 影响因子:0
- 作者:
Sandeep Mehta;Darpan Shah;Ravindra Kulkarni;Cornelia Caragea - 通讯作者:
Cornelia Caragea
A Group-Based Personalized Model for Image Privacy Classification and Labeling
基于群体的个性化图像隐私分类和标签模型
- DOI:
10.24963/ijcai.2017/552 - 发表时间:
2017 - 期刊:
- 影响因子:0
- 作者:
Haoti Zhong;A. Squicciarini;David J. Miller;Cornelia Caragea - 通讯作者:
Cornelia Caragea
MEDLINE/ PubMed
MEDLINE/PubMed
- DOI:
10.1007/978-0-387-39940-9_3039 - 发表时间:
2004 - 期刊:
- 影响因子:3.8
- 作者:
Cornelia Caragea;V. Honavar;P. Boncz;P. Larson;S. Dietrich;Gonzalo Navarro;Bhavani Thuraisingham;Yan Luo;Ouri E. Wolfson;S. Beitzel;Eric C. Jensen;Ophir Frieder;Christian S. Jensen;N. Tradisauskas;Ethan V. Munson;A. Wun;K. Goda;Stephen E. Fienberg;Jiashun Jin;Guimei Liu;Nick Craswell;T. Pedersen;Cesare Pautasso;M. Moro;S. Manegold;B. Carminati;Marina Blanton;Sara Bouchenak;Noël de Palma;Wei Tang;Christoph Quix;M. Jeusfeld;R. K. Pon;David J. Buttler;W. Meng;P. Zezula;Michal Batko;Vlastislav Dohnal;J. Domingo;Denilson Barbosa;Ioana Manolescu;Jeffrey Xu Yu;Emmanuel Cecchet;Vivien Quéma;Xifeng Yan;G. Santucci;D. Zeinalipour;Panos K. Chrysanthis;Amol Deshpande;Carlos Guestrin;Samuel Madden;Carson Kai;R. H. Güting;Amarnath Gupta;Heng Tao Shen;G. Weikum;Ramesh Jain;Jeffrey Xu Yu;Paolo Ciaccia;K. Candan;M. Sapino;C. Meghini;F. Sebastiani;U. Straccia;F. Nack;V. S. Subrahmanian;Maria Vanina Martinez;D. Reforgiato;T. Westerveld;M. Sebillo;G. Vitiello;Maria De Marsico;K. Voruganti;C. Parent;S. Spaccapietra;Christelle Vangenot;Esteban Zimányi;Prasan Roy;S. Sudarshan;E. Puppo;Peer Kröger;Matthias Renz;H. Schuldt;Solmaz Kolahi;A. Unwin;W. Cellary - 通讯作者:
W. Cellary
Cornelia Caragea的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Cornelia Caragea', 18)}}的其他基金
CHS: Small: Collaborative Research: Automating Relevance and Trust Detection in Social Media Data for Emergency Response
CHS:小型:协作研究:自动化社交媒体数据中的相关性和信任检测以进行紧急响应
- 批准号:
1903963 - 财政年份:2018
- 资助金额:
$ 23万 - 项目类别:
Standard Grant
TWC: Small: Collaborative: Towards Privacy Preserving Online Image Sharing
TWC:小型:协作:实现隐私保护在线图像共享
- 批准号:
1903714 - 财政年份:2018
- 资助金额:
$ 23万 - 项目类别:
Standard Grant
CRI: CI-SUSTAIN: Collaborative Research: CiteSeerX: Toward Sustainable Support of Scholarly Big Data
CRI:CI-SUSTAIN:协作研究:CiteSeerX:迈向学术大数据的可持续支持
- 批准号:
1853919 - 财政年份:2018
- 资助金额:
$ 23万 - 项目类别:
Standard Grant
BIGDATA: IA: Collaborative Research: Domain Adaptation Approaches for Classifying Crisis Related Data on Social Media
大数据:IA:协作研究:社交媒体上危机相关数据分类的领域适应方法
- 批准号:
1741353 - 财政年份:2018
- 资助金额:
$ 23万 - 项目类别:
Standard Grant
CAREER: From Data to Knowledge: Extracting and Utilizing Concept Graphs in Online Environments
职业:从数据到知识:在线环境中提取和利用概念图
- 批准号:
1802358 - 财政年份:2017
- 资助金额:
$ 23万 - 项目类别:
Continuing Grant
CAREER: From Data to Knowledge: Extracting and Utilizing Concept Graphs in Online Environments
职业:从数据到知识:在线环境中提取和利用概念图
- 批准号:
1652674 - 财政年份:2017
- 资助金额:
$ 23万 - 项目类别:
Continuing Grant
III: Small: Collaborative Research: Keyphrase Extraction in Document Networks
III:小:协作研究:文档网络中的关键词提取
- 批准号:
1813571 - 财政年份:2017
- 资助金额:
$ 23万 - 项目类别:
Continuing Grant
BIGDATA: IA: Collaborative Research: Domain Adaptation Approaches for Classifying Crisis Related Data on Social Media
大数据:IA:协作研究:社交媒体上危机相关数据分类的领域适应方法
- 批准号:
1802284 - 财政年份:2017
- 资助金额:
$ 23万 - 项目类别:
Standard Grant
TWC: Small: Collaborative: Towards Privacy Preserving Online Image Sharing
TWC:小型:协作:实现隐私保护在线图像共享
- 批准号:
1814255 - 财政年份:2017
- 资助金额:
$ 23万 - 项目类别:
Standard Grant
CHS: Small: Collaborative Research: Automating Relevance and Trust Detection in Social Media Data for Emergency Response
CHS:小型:协作研究:自动化社交媒体数据中的相关性和信任检测以进行紧急响应
- 批准号:
1814271 - 财政年份:2017
- 资助金额:
$ 23万 - 项目类别:
Standard Grant
相似国自然基金
醒脑静多靶点调控PI3K/Akt通路抑制CI/RI氧化应激—基于网络药理学及体内、外实验研究
- 批准号:2025JJ90117
- 批准年份:2025
- 资助金额:0.0 万元
- 项目类别:省市级项目
基于“免疫-神经”网络探讨眼针活化CI/RI大鼠MC靶向H3R调节“免疫监视”的抗炎机制
- 批准号:82374375
- 批准年份:2023
- 资助金额:51 万元
- 项目类别:面上项目
ci-Eln促进亲本基因Eln介导的缺氧肺动脉平滑肌细胞增殖的机制研究
- 批准号:
- 批准年份:2021
- 资助金额:30 万元
- 项目类别:青年科学基金项目
通过单细胞转录组测序揭示Wolbachia诱导果蝇CI的分子机制
- 批准号:32170497
- 批准年份:2021
- 资助金额:58 万元
- 项目类别:面上项目
森林垂直分层LAI和CI时空变异特征、LiDAR遥感反演与验证研究
- 批准号:
- 批准年份:2021
- 资助金额:59 万元
- 项目类别:面上项目
CI 994对SLC25A46相关线粒体病的治疗及机制研究
- 批准号:82001449
- 批准年份:2020
- 资助金额:24.0 万元
- 项目类别:青年科学基金项目
近邻星系中[CI]线作为新分子气体质量探针的观测研究
- 批准号:12003070
- 批准年份:2020
- 资助金额:24.0 万元
- 项目类别:青年科学基金项目
lncRNA343/miR-509-3p/STC1轴在CI-AKI肾小管上皮细胞线粒体质量控制失衡中的作用与机制
- 批准号:81873607
- 批准年份:2018
- 资助金额:57.0 万元
- 项目类别:面上项目
α2肾上腺素受体活化促ESCRT-III膜聚集在肾CI/RI致肺程序性坏死中的机制研究
- 批准号:81801900
- 批准年份:2018
- 资助金额:21.0 万元
- 项目类别:青年科学基金项目
内共生菌引起棉叶螨的细胞质不亲和(CI)的分子机理研究
- 批准号:31860508
- 批准年份:2018
- 资助金额:39.0 万元
- 项目类别:地区科学基金项目
相似海外基金
CRI: CI-SUSTAIN: Racket on Alternative Platforms
CRI:CI-SUSTAIN:替代平台上的喧嚣
- 批准号:
1823244 - 财政年份:2018
- 资助金额:
$ 23万 - 项目类别:
Continuing Grant
CRI: CI-SUSTAIN: Collaborative Research: Sustaining Lemur Project Resources for the Long-Term
CRI:CI-SUSTAIN:合作研究:长期维持狐猴项目资源
- 批准号:
1822986 - 财政年份:2018
- 资助金额:
$ 23万 - 项目类别:
Standard Grant
CRI: CI-SUSTAIN: Collaborative Research: CiteSeerX: Toward Sustainable Support of Scholarly Big Data
CRI:CI-SUSTAIN:协作研究:CiteSeerX:迈向学术大数据的可持续支持
- 批准号:
1823288 - 财政年份:2018
- 资助金额:
$ 23万 - 项目类别:
Standard Grant
CRI: CI-SUSTAIN: Collaborative Research: CiteSeerX: Toward Sustainable Support of Scholarly Big Data
CRI:CI-SUSTAIN:协作研究:CiteSeerX:迈向学术大数据的可持续支持
- 批准号:
1853919 - 财政年份:2018
- 资助金额:
$ 23万 - 项目类别:
Standard Grant
CRI: CI-SUSTAIN: Collaborative Research: Sustaining Lemur Project Resources for the Long-Term
CRI:CI-SUSTAIN:合作研究:长期维持狐猴项目资源
- 批准号:
1822975 - 财政年份:2018
- 资助金额:
$ 23万 - 项目类别:
Standard Grant
Collaborative Research: CI-SUSTAIN: StarExec: Cross-Community Infrastructure for Logic Solving
协作研究:CI-SUSTAIN:StarExec:用于逻辑解决的跨社区基础设施
- 批准号:
1730419 - 财政年份:2017
- 资助金额:
$ 23万 - 项目类别:
Standard Grant
CI-SUSTAIN: Sustainable Tools for Analysis and Research on Darknet Unsolicited Traffic (STARDUST).
CI-SUSTAIN:用于分析和研究暗网主动流量(STARDUST)的可持续工具。
- 批准号:
1730661 - 财政年份:2017
- 资助金额:
$ 23万 - 项目类别:
Standard Grant
Collaborative Research: CI-SUSTAIN: National File System Trace Repository
合作研究:CI-SUSTAIN:国家文件系统跟踪存储库
- 批准号:
1730726 - 财政年份:2017
- 资助金额:
$ 23万 - 项目类别:
Standard Grant
Collaborative Research: CI-SUSTAIN: National File System Trace Repository
合作研究:CI-SUSTAIN:国家文件系统跟踪存储库
- 批准号:
1729939 - 财政年份:2017
- 资助金额:
$ 23万 - 项目类别:
Standard Grant