III: SpatioTextual Extraction of Document on the Web for Digital Government Applications

III:用于数字政府应用的网络文档的空间文本提取

基本信息

  • 批准号:
    0713501
  • 负责人:
  • 金额:
    --
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Continuing Grant
  • 财政年份:
    2007
  • 资助国家:
    美国
  • 起止时间:
    2007-09-01 至 2011-08-31
  • 项目状态:
    已结题

项目摘要

Search technology today is dominated by search engines such as the one provided by Google where for a given query string s , a set D of documents is retrieved with the aid of an algorithm that ranks the elements of D on the basis of how many other documents link to it. This research will investigate the issues involved in the development of a search engine that supports geographic location retrieval, and its deployment in a setting involving digital government applications where it is also desirable to retrieve documents on the basis of spatial proximity. Intellectual Merit: (1) Identifying geographic references in documents is a challenging issue and is necessary for advanced search applications. This is especially true in the case of users who want to browse large collections of documents that are not necessarily on the web and to explore and discover spatial relationships either in the same document or in a collection of documents. (2) Increasingly, users are looking for documents that contain spatially proximate content. Thus the traditional method of ranking by the link structure of the web is not appropriate. Determining the geographic focus of a document is a difficult task but is necessary in applications such as those dealing with documents on the hidden web, which is a set of documents, usually proprietary, that is for internal use of an organization and is often not available on the Internet. This means that there are few, if any links to these documents, and thus popular internet search strategies are not applicable. (3) Treating spatial content of documents as a first-class citizen, in the sense that a geographic scope is reported for each document that is retrieved regardless of whether the query has a spatial component, is difficult given the need to resolve issues related to aliasing (realizing that ''''Los Angeles'''' and ''''LA'''' are the same) and ambiguity (different interpretations for ''''London''''). (4) Developing query optimization and execution strategies for queries that involve both a textual and spatial component. (5) Developing effective techniques for measuring spatial similarity other than proximity, as well as techniques for measuring combinations of spatial and textual similarity. This includes the adaptation of the skyline operator. Broad Impacts: The ability to retrieve documents on the basis of spatial proximity makes for a better search experience and will lead to more relevant results. The tools to be developed will also extend the reach of search engines from being restricted to documents on the internet to documents that reside on the hidden web. The deployment of these tools in government web sites via collaboration with the grant''s digital government partners has the effect of empowering citizens to find out what their government is doing, thereby leading to a more informed citizenry.
如今的搜索技术由谷歌提供的搜索引擎主导,对于给定的查询字符串S,通过一种算法检索一组D文档,该算法根据链接到D的其他文档的数量对D的元素进行排名。这项研究将调查支持地理位置检索的搜索引擎的开发及其在涉及数字政府应用程序的环境中的部署所涉及的问题,在数字政府应用程序中,也需要根据空间邻近性来检索文件。智能优点:(1)识别文档中的地理参考是一个具有挑战性的问题,对于高级搜索应用程序来说是必要的。对于想要浏览不一定在Web上的大量文档集合并在同一文档或文档集合中探索和发现空间关系的用户来说尤其如此。(2)越来越多的用户正在寻找包含空间邻近内容的文档。因此,传统的按网页链接结构进行排名的方法是不合适的。确定文档的地理焦点是一项困难的任务,但在诸如处理隐藏Web上的文档的应用程序中是必要的,隐藏Web是一组文档,通常是专有的,供组织内部使用,通常在互联网上不可用。这意味着到这些文档的链接很少,因此流行的互联网搜索策略不适用。(3)将文档的空间内容作为一等公民对待是困难的,因为无论查询是否具有空间成分,都会为检索到的每个文档报告地理范围,这是困难的,因为需要解决与别名(意识到“”洛杉矶“”和“”LA“”是相同的)和歧义(对“”伦敦“”的不同解释)有关的问题)。(4)为同时涉及文本和空间成分的查询开发查询优化和执行策略。(5)开发衡量空间相似性而不是邻近性的有效技术,以及衡量空间相似性和文本相似性组合的技术。这包括天际线运营商的改编。广泛的影响:根据空间接近程度检索文件的能力有助于更好的搜索体验,并将产生更相关的结果。即将开发的工具还将扩大搜索引擎的覆盖范围,从局限于互联网上的文档扩展到隐藏网络上的文档。通过与S数字政府合作伙伴的合作,在政府网站上部署这些工具的效果是使公民能够了解他们的政府正在做什么,从而导致公民更知情。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Hanan Samet其他文献

Vertex representations and their applications in computer graphics
  • DOI:
    10.1007/s003710050138
  • 发表时间:
    1998-10-01
  • 期刊:
  • 影响因子:
    2.900
  • 作者:
    Claudio Esperança;Hanan Samet
  • 通讯作者:
    Hanan Samet
We start by comparing and contrasting our work with the related work of Clarkson
我们首先将我们的工作与克拉克森的相关工作进行比较和对比
  • DOI:
  • 发表时间:
    2007
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Jagan Sankaranarayanan;Hanan Samet;Amitabh Varshney
  • 通讯作者:
    Amitabh Varshney
Approximating CSG trees of moving objects
  • DOI:
    10.1007/bf02341044
  • 发表时间:
    1990-07-01
  • 期刊:
  • 影响因子:
    2.900
  • 作者:
    Hanan Samet;Markku Tamminen
  • 通讯作者:
    Markku Tamminen
Heuristic for the line division problem in computer justified text
计算机合理文本中的行划分问题的启发式
  • DOI:
    10.1145/358589.358621
  • 发表时间:
    1982
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Hanan Samet
  • 通讯作者:
    Hanan Samet
Decomposing a window into maximal quadtree blocks
  • DOI:
    10.1007/bf01210594
  • 发表时间:
    1993-05-01
  • 期刊:
  • 影响因子:
    0.500
  • 作者:
    Walid G. Aref;Hanan Samet
  • 通讯作者:
    Hanan Samet

Hanan Samet的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Hanan Samet', 18)}}的其他基金

III: Small: Trajectory Computing
III:小:轨迹计算
  • 批准号:
    2114451
  • 财政年份:
    2021
  • 资助金额:
    --
  • 项目类别:
    Continuing Grant
EAGER: NewsStand CoronaViz: A Map Query Interface for Tracking the Spread of COVID-19
EAGER:NewsStand CoronaViz:用于跟踪 COVID-19 传播的地图查询界面
  • 批准号:
    2041415
  • 财政年份:
    2020
  • 资助金额:
    --
  • 项目类别:
    Standard Grant
III: Small: Using Location for Retrieving Text and Images in News And Social Media Posts
III:小:使用位置检索新闻和社交媒体帖子中的文本和图像
  • 批准号:
    1816889
  • 财政年份:
    2018
  • 资助金额:
    --
  • 项目类别:
    Standard Grant
I-Corps: RoadsInDB: Customer Discovery in the Logistics, Delivery, Ride Sharing, Location-based Services and Analytics Verticals
I-Corps:RoadsInDB:物流、交付、乘车共享、基于位置的服务和分析垂直领域的客户发现
  • 批准号:
    1634753
  • 财政年份:
    2016
  • 资助金额:
    --
  • 项目类别:
    Standard Grant
III: Small: Managing Spatial Data in a Distributed Environment
III:小型:在分布式环境中管理空间数据
  • 批准号:
    1320791
  • 财政年份:
    2013
  • 资助金额:
    --
  • 项目类别:
    Continuing Grant
III: Small: Issues in the Management of GeoMultimedia Data
III:小:地理多媒体数据管理中的问题
  • 批准号:
    1219023
  • 财政年份:
    2012
  • 资助金额:
    --
  • 项目类别:
    Standard Grant
III: Small: Issues in Understanding, Indexing, Querying, and Visualizing Spatio-Textual Spreadsheets on the Web
III:小:网络上的空间文本电子表格的理解、索引、查询和可视化问题
  • 批准号:
    1018475
  • 财政年份:
    2010
  • 资助金额:
    --
  • 项目类别:
    Standard Grant
III/EAGER: TwitterStand: Separating the Wheat from the Chaff in Breaking News
III/EAGER:TwitterStand:在突发新闻中将小麦与谷壳分开
  • 批准号:
    0948548
  • 财政年份:
    2009
  • 资助金额:
    --
  • 项目类别:
    Standard Grant
Scalable Geometric and High Dimensional Data Structures and Algorithms: A Parallel and Distributed Approach
可扩展的几何和高维数据结构和算法:并行和分布式方法
  • 批准号:
    0830618
  • 财政年份:
    2009
  • 资助金额:
    --
  • 项目类别:
    Standard Grant
III-COR-Small: Similarity Criteria Issues in Similarity Retrieval
III-COR-Small:相似检索中的相似标准问题
  • 批准号:
    0812377
  • 财政年份:
    2008
  • 资助金额:
    --
  • 项目类别:
    Continuing Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了