EAGER: Automatic Document and Record Disposition and Retention

EAGER:自动文档和记录处置和保留

基本信息

  • 批准号:
    1143921
  • 负责人:
  • 金额:
    $ 20万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2011
  • 资助国家:
    美国
  • 起止时间:
    2011-08-01 至 2015-07-31
  • 项目状态:
    已结题

项目摘要

Record and document retention (document disposition) has become a serious problem for both organizations and individuals since most documents now created are digital. Digital documents offer both problems and advantages. Digital documents are easily versioned, copied and disseminated. Thus, there can be several similar copies or versions of important or relevant documents in many locations. Document or record disposition can be applied to or is needed by individuals, organizations and domains (such as law, science, policy, etc.) for effective information management over long periods of time. This problem is of epic proportions and is becoming a major problem in organizations and for individuals throughout the world where effective record disposition is either required by law or by the organization or by practical limitations in systems.This exploratory project investigates possible automatic document disposition methods based on algorithms for text inspection, mining, and search. The challenges lie in finding scalable, adaptable algorithms that can be used in several if not all application domains. In addition, variability in users presents many problems. A disposition method or procedure may vary depending on the user, organization and domain (e.g., law, health records, etc.). The approach explored in this project applies and extends machine learning methods to these problems since these methods adapt to variability in data, areas and domains. Using such approaches, automated disposition methods can be readily applied to these different areas such as science, email and legal records. This research lays the groundwork for adaptive methods for a variety of domains in terms of applicability, performance and scalability. this proof-of-concept project initially focuses on the Enron email data set that is publicly available and is be used to demonstrate the feasibility of the approach since email can be considered a special case of document disposition. If successful, other disposition domains such as science and government data will be explored. This work will show the viability of developing and applying machine learning methods to an important and diverse problem domain. The results from this exploratory project together with insights gathered from methods used in large scale document search are expected to yield understanding as to how we can better manage our digital past and the rapidly expanding digital future. The results are expected to introduce this important problem to other researchers and document disposition professionals and lead to collaborations with industry. Data and research results will be made available through a publicly available website (http://clgiles.ist.psu.edu/disposeseer/) and research papers will be published and presented in appropriate venues. The project provides research experience for graduate and undergraduate students.
记录和文件保留(文件处理)已经成为组织和个人的严重问题,因为现在大多数文件都是数字化的。数字文档既有问题也有优点。数字文件易于版本、复制和传播。因此,重要或相关的文件可能在许多地方有几个类似的副本或版本。文件或记录处理可以应用于个人、组织和领域(如法律、科学、政策等),或者是个人、组织和领域(如法律、科学、政策等)需要的,以便长期有效地进行信息管理。这个问题是史诗般的比例,并正在成为世界各地的组织和个人的主要问题,因为法律或组织或系统的实际限制要求有效的记录处理。这个探索性项目研究了基于文本检查、挖掘和搜索算法的可能的自动文档处理方法。挑战在于找到可扩展的、可适应的算法,这些算法可以在几个(如果不是所有)应用程序领域中使用。此外,用户的可变性带来了许多问题。处置方法或程序可能因用户、组织和领域(例如,法律、健康记录等)而异。本项目探索的方法应用并扩展了机器学习方法来解决这些问题,因为这些方法适应数据、区域和领域的可变性。使用这些方法,自动处理方法可以很容易地应用于这些不同的领域,如科学、电子邮件和法律记录。本研究为各种领域的自适应方法在适用性、性能和可扩展性方面奠定了基础。这个概念验证项目最初侧重于公开可用的安然电子邮件数据集,并用于演示该方法的可行性,因为电子邮件可以被视为文档处置的特殊情况。如果成功,将探索科学和政府数据等其他处置领域。这项工作将展示开发和应用机器学习方法到一个重要的和多样化的问题领域的可行性。这个探索性项目的结果以及从大规模文档搜索方法中收集到的见解,有望使我们更好地管理我们的数字化过去和快速扩展的数字化未来。研究结果有望将这一重要问题介绍给其他研究人员和文件处理专业人员,并导致与工业界的合作。数据和研究结果将通过一个公开的网站(http://clgiles.ist.psu.edu/disposeseer/)提供,研究论文将在适当的场所发表和展示。该项目为研究生和本科生提供研究经验。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

C. Lee Giles其他文献

BBookX: An Automatic Book Creation Framework
BBookX:自动书籍创建框架
SearchGen: a synthetic workload generator for scientific literature digital libraries and search engines
SearchGen:科学文献数字图书馆和搜索引擎的综合工作负载生成器
Phrase Pair Classification for Identifying Subtopics
用于识别子主题的短语对分类
Using Non-invertible Data Transformations to Build Adversarial-Robust Neural Networks
使用不可逆数据转换构建对抗性鲁棒神经网络
  • DOI:
  • 发表时间:
    2016
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Qinglong Wang;Wenbo Guo;Alexander Ororbia;Xinyu Xing;Lin Lin;C. Lee Giles;Xue Liu;Peng Liu;Gang Xiong
  • 通讯作者:
    Gang Xiong
SNDocRank: document ranking based on social networks
SNDocRank:基于社交网络的文档排名
  • DOI:
    10.1145/1772690.1772825
  • 发表时间:
    2010
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Liang Gou;Hung;Jung;X. Zhang;C. Lee Giles
  • 通讯作者:
    C. Lee Giles

C. Lee Giles的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('C. Lee Giles', 18)}}的其他基金

CRI: CI-SUSTAIN: Collaborative Research: CiteSeerX: Toward Sustainable Support of Scholarly Big Data
CRI:CI-SUSTAIN:协作研究:CiteSeerX:迈向学术大数据的可持续支持
  • 批准号:
    1823288
  • 财政年份:
    2018
  • 资助金额:
    $ 20万
  • 项目类别:
    Standard Grant
III: Small: Collaborative Research: Keyphrase Extraction in Document Networks
III:小:协作研究:文档网络中的关键词提取
  • 批准号:
    1422951
  • 财政年份:
    2014
  • 资助金额:
    $ 20万
  • 项目类别:
    Continuing Grant
Collaborative Research: STEM Workforce Training: A Quasi-Experimental Approach Using the Effects of Research Funding
协作研究:STEM 劳动力培训:利用研究经费影响的准实验方法
  • 批准号:
    1348712
  • 财政年份:
    2013
  • 资助金额:
    $ 20万
  • 项目类别:
    Standard Grant
Collaborative Research: CI-ADDO-EN: Semantic CiteSeer X
合作研究:CI-ADDO-EN:语义 CiteSeer X
  • 批准号:
    0958143
  • 财政年份:
    2010
  • 资助金额:
    $ 20万
  • 项目类别:
    Continuing Grant
EAGER: Creating a Book Citation Index
EAGER:创建图书引文索引
  • 批准号:
    1042276
  • 财政年份:
    2010
  • 资助金额:
    $ 20万
  • 项目类别:
    Standard Grant
CRI: Collaborative: Next Generation CiteSeer
CRI:协作:下一代 CiteSeer
  • 批准号:
    0454052
  • 财政年份:
    2005
  • 资助金额:
    $ 20万
  • 项目类别:
    Continuing Grant
SGER: A Digital Library Archive for Computer Scientists
SGER:计算机科学家的数字图书馆档案
  • 批准号:
    0330783
  • 财政年份:
    2003
  • 资助金额:
    $ 20万
  • 项目类别:
    Standard Grant

相似海外基金

Automatic collocation generation for English learners as a foreign language using document similarity analysis
使用文档相似性分析为英语学习者自动生成搭配
  • 批准号:
    16K00489
  • 财政年份:
    2016
  • 资助金额:
    $ 20万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
A Study of Automatic Recognition Mechanisms of Meanings of Location Names in Document Databases
文献数据库地名意义自动识别机制研究
  • 批准号:
    19700089
  • 财政年份:
    2007
  • 资助金额:
    $ 20万
  • 项目类别:
    Grant-in-Aid for Young Scientists (B)
Automatic Document Extraction for Component Reuse
自动文档提取以实现组件重用
  • 批准号:
    19800021
  • 财政年份:
    2007
  • 资助金额:
    $ 20万
  • 项目类别:
    Grant-in-Aid for Young Scientists (Start-up)
Automatic recognition of topic transition for newspaper articles and application to document summary
自动识别报纸文章的主题转换并应用于文档摘要
  • 批准号:
    15500086
  • 财政年份:
    2003
  • 资助金额:
    $ 20万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Automatic Transformation of GDA Document Tag and Development of Its Applications
GDA文档标签自动转换及其应用开发
  • 批准号:
    13558037
  • 财政年份:
    2001
  • 资助金额:
    $ 20万
  • 项目类别:
    Grant-in-Aid for Scientific Research (B)
An Intelligent Automatic Tranlation System of Printed Document into Braille at high accuracy
印刷文件高精度自动盲文翻译系统
  • 批准号:
    04508001
  • 财政年份:
    1992
  • 资助金额:
    $ 20万
  • 项目类别:
    Grant-in-Aid for Developmental Scientific Research (A)
Research Initiation - Automatic Document Retrieval By Conceptual Content Analysis
研究启动——通过概念内容分析进行自动文档检索
  • 批准号:
    7510492
  • 财政年份:
    1975
  • 资助金额:
    $ 20万
  • 项目类别:
    Standard Grant
Automatic Analysis and Annotation of Document Keywords in Biomedical Literature
生物医学文献中文档关键词的自动分析与标注
  • 批准号:
    8344960
  • 财政年份:
  • 资助金额:
    $ 20万
  • 项目类别:
Automatic Analysis and Annotation of Document Keywords in Biomedical Literature
生物医学文献中文档关键词的自动分析与标注
  • 批准号:
    8558117
  • 财政年份:
  • 资助金额:
    $ 20万
  • 项目类别:
Automatic Analysis and Annotation of Document Keywords in Biomedical Literature
生物医学文献中文档关键词的自动分析与标注
  • 批准号:
    8149607
  • 财政年份:
  • 资助金额:
    $ 20万
  • 项目类别:
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了