Large-Scale Web Research Testbed
大规模网络研究测试平台
基本信息
- 批准号:0322975
- 负责人:
- 金额:$ 43.98万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2003
- 资助国家:美国
- 起止时间:2003-09-15 至 2006-08-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
This project, establishing a very large repository of current and historical Web content, supports two group research efforts: Management of the Web content and Analysis and mining of the content.The current facilities have been instrumental in examining many aspects of the World Wide Web (WWW). These aspects include experimentation toward understanding, optimally utilizing, and improving the Web. The facility has enabled researchers to try out various hypothesis and techniques for indexing and modeling WWW information. The system's highly configurable crawlers collect a large number of Web pages, storing them locally for testing novel algorithms, such as ranking, filtering, or Web linkage mapping on the collection. The current WebBase is underpowered; for example, the crawling speeds are limited by CPU performance (retrieved pages are compressed before being stored), and often by virtual memory space. Removing these two bottlenecks will enable sustaining a higher Web sample rate and covering larger areas of the Web. An upgraded testbed, developed by scaling up in size and processing speed of the current hardware facilities of an existing system called WebBase, will be used to study and evaluate different Web crawling, archive refreshing, data compression, and storage and indexing techniques. Moreover, the project investigates problems related to data extraction, semantic search, searching for non-text objects, access control, cross-temporal analysis, and mining patterns or relationships between entities. Problems to be addressed include: How to Collect ever-growing amount of Web data, and keep it up to date, Provide improved search capabilities over such data, better exploiting the semantics of data and user requests, Efficiently process high-volume real-time data streams, Organize a Web archive that captures the "history" of the Web, and Deal with new types of sources (e.g., the hidden web or chat rooms) and new types of data (e.g., images).In addition, the new WebBase facility will support teaching at various universities by providing a testbed where the students can develop new searching, indexing, and user presentation ideas. WebBase draws together faculty in the areas of data mining, security, natural language processing, and database systems; consequently, the areas enhance each other. Thus, the infrastructure will support: Experimental research in a critical area: management and exploration of Web information; Researchers at institutions that do not have sufficient facilities for large-scale Web crawling; and Teaching of courses on information retrieval and data mining.
该项目建立了一个非常大的当前和历史 Web 内容存储库,支持两个小组的研究工作:Web 内容管理以及内容分析和挖掘。当前的设施在检查万维网 (WWW) 的许多方面发挥了重要作用。这些方面包括理解、优化利用和改进网络的实验。该设施使研究人员能够尝试各种假设和技术来对 WWW 信息进行索引和建模。该系统的高度可配置的爬虫收集大量网页,将它们存储在本地以测试新颖的算法,例如集合上的排名、过滤或Web链接映射。当前的 WebBase 功能不足;例如,抓取速度受到CPU性能的限制(检索到的页面在存储之前被压缩),并且通常受到虚拟内存空间的限制。消除这两个瓶颈将能够维持更高的网络采样率并覆盖更大的网络区域。通过扩大现有系统(称为 WebBase)的当前硬件设施的规模和处理速度而开发的升级测试平台将用于研究和评估不同的网络爬行、存档刷新、数据压缩以及存储和索引技术。此外,该项目还研究与数据提取、语义搜索、搜索非文本对象、访问控制、跨时空分析以及挖掘模式或实体之间的关系相关的问题。需要解决的问题包括:如何收集不断增长的网络数据量并使其保持最新;提供对此类数据改进的搜索功能,更好地利用数据和用户请求的语义;有效处理大量实时数据流;组织捕获网络“历史”的网络档案;以及处理新类型的源(例如,隐藏的网络或聊天室)和新类型的数据(例如,图像)。 此外,新的 WebBase 设施将通过提供一个测试平台来支持各大学的教学,学生可以在其中开发新的搜索、索引和用户演示想法。 WebBase 汇集了数据挖掘、安全、自然语言处理和数据库系统领域的教师;因此,这些领域相互促进。因此,该基础设施将支持: 关键领域的实验研究:网络信息的管理和探索;没有足够设施进行大规模网络爬行的机构的研究人员;信息检索和数据挖掘课程的教学。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Hector Garcia-Molina其他文献
A sound and complete algorithm for distributed commerce transactions
- DOI:
10.1007/s004460050052 - 发表时间:
1999-03-01 - 期刊:
- 影响因子:2.100
- 作者:
Steven P. Ketchpel;Hector Garcia-Molina - 通讯作者:
Hector Garcia-Molina
InfoMonitor: unobtrusively archiving a World Wide Web server
- DOI:
10.1007/s00799-003-0052-x - 发表时间:
2005-04-01 - 期刊:
- 影响因子:1.700
- 作者:
Brian F. Cooper;Hector Garcia-Molina - 通讯作者:
Hector Garcia-Molina
Assigning textual names to sets of geographic coordinates
- DOI:
10.1016/j.compenvurbsys.2006.02.001 - 发表时间:
2006-07-01 - 期刊:
- 影响因子:
- 作者:
Mor Naaman;Yee Jiun Song;Andreas Paepcke;Hector Garcia-Molina - 通讯作者:
Hector Garcia-Molina
Maximizing remote work in flooding-based peer-to-peer systems
- DOI:
10.1016/j.comnet.2005.09.024 - 发表时间:
2006-07-14 - 期刊:
- 影响因子:
- 作者:
Qixiang Sun;Neil Daswani;Hector Garcia-Molina - 通讯作者:
Hector Garcia-Molina
HyperFile: A data and query model for documents
- DOI:
10.1007/bf01232472 - 发表时间:
1995-01-01 - 期刊:
- 影响因子:3.800
- 作者:
Chris Clifton;Hector Garcia-Molina;David Bloom - 通讯作者:
David Bloom
Hector Garcia-Molina的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Hector Garcia-Molina', 18)}}的其他基金
III: Large: Collaborative Research: Web Archive Cooperative
III:大型:协作研究:网络档案合作社
- 批准号:
1009916 - 财政年份:2010
- 资助金额:
$ 43.98万 - 项目类别:
Standard Grant
EAGER: InfoCalc, a Spreadsheet Interface to Web Archive Analysis
EAGER:InfoCalc,网络档案分析的电子表格界面
- 批准号:
0941727 - 财政年份:2009
- 资助金额:
$ 43.98万 - 项目类别:
Standard Grant
SGER, year II: A Web Sociologist's Workbench
SGER,第二年:网络社会学家的工作台
- 批准号:
0735129 - 财政年份:2007
- 资助金额:
$ 43.98万 - 项目类别:
Standard Grant
CRI: CRD Analysis Toolbenches for Web Archives
CRI:网络档案 CRD 分析工具台
- 批准号:
0707464 - 财政年份:2007
- 资助金额:
$ 43.98万 - 项目类别:
Standard Grant
SGER: A Web Sociologist's Workbench
SGER:网络社会学家的工作台
- 批准号:
0624725 - 财政年份:2006
- 资助金额:
$ 43.98万 - 项目类别:
Standard Grant
SEI(BIO): Computing Support for Acquisition, Collaborative Curation, and Dissemination in Biodiversity Research
SEI(BIO):生物多样性研究中采集、协作管理和传播的计算支持
- 批准号:
0430448 - 财政年份:2004
- 资助金额:
$ 43.98万 - 项目类别:
Continuing Grant
ITR: DataMotion - Dealing With Fast-Moving Data
ITR:DataMotion - 处理快速移动的数据
- 批准号:
0324431 - 财政年份:2003
- 资助金额:
$ 43.98万 - 项目类别:
Continuing Grant
ITR: From the Web to the Global InfoBase
ITR:从网络到全球信息库
- 批准号:
0085896 - 财政年份:2000
- 资助金额:
$ 43.98万 - 项目类别:
Standard Grant
DLI-Phase 2: Stanford InterLib Technologies
DLI-第 2 阶段:斯坦福 InterLib Technologies
- 批准号:
9817799 - 财政年份:1999
- 资助金额:
$ 43.98万 - 项目类别:
Cooperative Agreement
相似国自然基金
基于热量传递的传统固态发酵过程缩小(Scale-down)机理及调控
- 批准号:22108101
- 批准年份:2021
- 资助金额:30 万元
- 项目类别:青年科学基金项目
基于Multi-Scale模型的轴流血泵瞬变流及空化机理研究
- 批准号:31600794
- 批准年份:2016
- 资助金额:22.0 万元
- 项目类别:青年科学基金项目
针对Scale-Free网络的紧凑路由研究
- 批准号:60673168
- 批准年份:2006
- 资助金额:25.0 万元
- 项目类别:面上项目
相似海外基金
Cognitive mechanisms underlying synesthetic metaphors and synesthesia :A large-scale web experiment
联觉隐喻和联觉背后的认知机制:大规模网络实验
- 批准号:
21H00960 - 财政年份:2021
- 资助金额:
$ 43.98万 - 项目类别:
Grant-in-Aid for Scientific Research (B)
EAGER: Using Large-scale Web Data for Online Attention Models and Identification of Reading Disabilities
EAGER:使用大规模网络数据进行在线注意力模型和阅读障碍识别
- 批准号:
1840751 - 财政年份:2018
- 资助金额:
$ 43.98万 - 项目类别:
Standard Grant
Applicability of Web-based dietary assessment system for a large-scale epidemiological study.
基于网络的膳食评估系统在大规模流行病学研究中的适用性。
- 批准号:
15H02906 - 财政年份:2015
- 资助金额:
$ 43.98万 - 项目类别:
Grant-in-Aid for Scientific Research (B)
Analyses of social contagion and construction of false rumor's control methods using large scale web data
基于大规模网络数据的社会传染分析及虚假谣言控制方法构建
- 批准号:
25750130 - 财政年份:2013
- 资助金额:
$ 43.98万 - 项目类别:
Grant-in-Aid for Young Scientists (B)
CSR: Small: Large-Scale Web Crawling and Spam Avoidance in Search-Engine Applications
CSR:小:搜索引擎应用中的大规模网络爬行和垃圾邮件避免
- 批准号:
1017766 - 财政年份:2010
- 资助金额:
$ 43.98万 - 项目类别:
Standard Grant
Developing a web-based rating system for Japanese Oral Proficiency Test as large scale test
开发基于网络的日本口语能力测试大规模测试评级系统
- 批准号:
22520528 - 财政年份:2010
- 资助金额:
$ 43.98万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
A Development of Efficient and Large-scale Resource Allocation Method based on 'Yuzuriai' and Votings via the Web
基于“Yuzuriai”和网络投票的高效大规模资源分配方法的开发
- 批准号:
22700142 - 财政年份:2010
- 资助金额:
$ 43.98万 - 项目类别:
Grant-in-Aid for Young Scientists (B)
Real-time physical world analysis with large-scale web streaming dat
利用大规模网络流数据进行实时物理世界分析
- 批准号:
22650017 - 财政年份:2010
- 资助金额:
$ 43.98万 - 项目类别:
Grant-in-Aid for Challenging Exploratory Research
Large-Scale Specific Object Recognition and Its Application to Real-World Oriented Web
大规模特定对象识别及其在面向现实世界的Web中的应用
- 批准号:
22300062 - 财政年份:2010
- 资助金额:
$ 43.98万 - 项目类别:
Grant-in-Aid for Scientific Research (B)
Design-assistance system of large scale information systems based on formal method and web ontology
基于形式化方法和网络本体的大型信息系统设计辅助系统
- 批准号:
20500045 - 财政年份:2008
- 资助金额:
$ 43.98万 - 项目类别:
Grant-in-Aid for Scientific Research (C)














{{item.name}}会员




