CAREER: Robust and Secure Multi-Modal Learning for Library-Scale Text Collections

职业:图书馆规模文本收藏的稳健且安全的多模式学习

基本信息

  • 批准号:
    1652536
  • 负责人:
  • 金额:
    $ 55万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Continuing Grant
  • 财政年份:
    2017
  • 资助国家:
    美国
  • 起止时间:
    2017-05-15 至 2024-04-30
  • 项目状态:
    已结题

项目摘要

The growth of social media and digitized libraries has made computational text analysis a vital tool for modern scholarship. But too often methods that work on standardized collections for expert users don't translate to real-world data analysis. In order to be useful, text mining methodologies need to balance theoretical power with practical application. Real data sets are noisy and complicated. More importantly, vast amounts of data cannot be shared directly due to copyright, including all published books after 1923. This project will develop tools that can be applied to limited, privatized views of documents. Algorithms will focus on reliability and efficiency, so that powerful techniques can be used by non-expert users on easily accessible hardware, such as the 10 million K-12 students using low-powered browser-based Chromebooks thereby increasing the societal impact of the work.Unsupervised text mining methods such as topic models and word embeddings have become popular outside of machine learning because they operate on simple, widely-available representations and identify latent variables that represent recognizable themes, events, or concepts. But standard algorithms do not scale well, require full access to potentially sensitive text collections, and cannot take advantage of non-textual data such as images. Although recent work in spectral inference has produced improvements in speed, current methods are plagued by sensitivity to noisy observations. This work will develop a unified approach to unsupervised text mining based on matrix and tensor factorization. The project will focus on data rectification methods for input matrices, enabling simple algorithms to work dramatically better, even in the presence of sparse and noisy observations, while also reducing model uncertainty. The project will develop new methods for learning from private and sensitive documents by creating public views of non-public data. These will include both noisy representations of individual documents as well as corpus-level summary matrices, and support both strong non-identifiability and weaker non-expressivity criteria. Finally, the project will develop new tools for modeling images and text optimized for the way images actually accompany text in real corpora, rather than short, artificial captions. By jointly modeling large volumes of text and semantically related images, the project will enable users to search for contextually related images, not just visually similar images, and identify topics that are grounded in the visual world, not just in text. For further information see the project web page: http://mimno.infosci.cornell.edu
社交媒体和数字化图书馆的发展使计算文本分析成为现代学术的重要工具。但是,为专家用户工作的标准化集合的方法往往不能转化为现实世界的数据分析。为了发挥作用,文本挖掘方法需要平衡理论力量与实际应用。真实的数据集是嘈杂和复杂的。更重要的是,由于版权问题,大量数据无法直接共享,包括1923年以后出版的所有书籍。该项目将开发可用于有限的、私有化的文件视图的工具。算法将专注于可靠性和效率,以便非专家用户可以在易于访问的硬件上使用强大的技术,例如1000万K-12学生使用基于低功耗浏览器的Chromebook,从而增加工作的社会影响。无监督文本挖掘方法,如主题模型和单词嵌入,在机器学习之外已经变得流行,因为它们操作简单,广泛可用的表示,并识别表示可识别的主题,事件或概念的潜在变量。但标准算法的扩展性不好,需要完全访问潜在的敏感文本集合,并且不能利用图像等非文本数据。虽然最近的工作在频谱推断产生了改进的速度,目前的方法是受干扰的观测灵敏度。这项工作将开发一个统一的方法,无监督文本挖掘的基础上矩阵和张量分解。该项目将专注于输入矩阵的数据校正方法,使简单的算法即使在稀疏和噪声观测的情况下也能更好地工作,同时还能降低模型的不确定性。该项目将通过创建非公开数据的公开视图,开发从私人和敏感文件中学习的新方法。这些将包括单个文档的噪声表示以及语料库级别的摘要矩阵,并支持强不可识别性和弱不可表达性标准。最后,该项目将开发新的工具,用于对图像和文本进行建模,并针对图像在真实的语料库中实际伴随文本的方式进行优化,而不是简短的人工标题。通过对大量文本和语义相关的图像进行联合建模,该项目将使用户能够搜索上下文相关的图像,而不仅仅是视觉相似的图像,并识别基于视觉世界的主题,而不仅仅是文本。欲了解更多信息,请访问项目网页:http://mimno.infosci.cornell.edu

项目成果

期刊论文数量(13)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
The strange geometry of skip-gram with negative sampling
  • DOI:
    10.18653/v1/d17-1308
  • 发表时间:
    2017-09
  • 期刊:
  • 影响因子:
    0
  • 作者:
    David Mimno;Laure Thompson
  • 通讯作者:
    David Mimno;Laure Thompson
Comparing Text Representations: A Theory-Driven Approach
  • DOI:
    10.18653/v1/2021.emnlp-main.449
  • 发表时间:
    2021-09
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Gregory Yauney;David M. Mimno
  • 通讯作者:
    Gregory Yauney;David M. Mimno
Computational Cut-Ups: The Influence of Dada
计算剪切:达达主义的影响
Combatting The Challenges of Local Privacy for Distributional Semantics with Compression
  • DOI:
  • 发表时间:
    2019
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Alexandra Schofield
  • 通讯作者:
    Alexandra Schofield
Like Two Pis in a Pod: Author Similarity Across Time in the Ancient Greek Corpus
就像豆荚里的两个 Pi:古希腊语料库中不同时间的作者相似度
  • DOI:
    10.22148/001c.13680
  • 发表时间:
    2020
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Storey, Grant;Mimno, David
  • 通讯作者:
    Mimno, David
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

David Mimno其他文献

Missing Photos, Suffering Withdrawal, or Finding Freedom? How Missing Photos, Suffering Withdrawal, or Finding Freedom? How Experiences of Social Media Non-Use Influence the Likelihood of Experiences of Social Media Non-Use Influence the Likelihood of Reversion Reversion
丢失照片、遭受退缩之苦,还是寻找自由?
  • DOI:
  • 发表时间:
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Eric Baumer;Shion Guha;Emily Quan;David Mimno;Geri K. Gay
  • 通讯作者:
    Geri K. Gay
Beyond Digital Incunabula: Modeling the Next Generation of Digital Libraries
超越数字摇篮:下一代数字图书馆建模
Prior-aware Dual Decomposition: Document-specific Topic Inference for Spectral Topic Models
先验双重分解:谱主题模型的文档特定主题推理
  • DOI:
  • 发表时间:
    2017
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Moontae Lee;D. Bindel;David Mimno
  • 通讯作者:
    David Mimno
The Tell-Tale Hat: Surfacing the Uncertainty in Folklore Classification
告密帽:揭示民俗分类中的不确定性
  • DOI:
    10.22148/16.012
  • 发表时间:
    2017
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Peter M. Broadwell;David Mimno;Timothy R. Tangherlini
  • 通讯作者:
    Timothy R. Tangherlini
Reconstructing Pompeian Households
重建庞贝家庭

David Mimno的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('David Mimno', 18)}}的其他基金

Conference: Text As Data Conference 2022
会议:2022 年文本即数据会议
  • 批准号:
    2232664
  • 财政年份:
    2022
  • 资助金额:
    $ 55万
  • 项目类别:
    Standard Grant

相似国自然基金

供应链管理中的稳健型(Robust)策略分析和稳健型优化(Robust Optimization )方法研究
  • 批准号:
    70601028
  • 批准年份:
    2006
  • 资助金额:
    7.0 万元
  • 项目类别:
    青年科学基金项目
心理紧张和应力影响下Robust语音识别方法研究
  • 批准号:
    60085001
  • 批准年份:
    2000
  • 资助金额:
    14.0 万元
  • 项目类别:
    专项基金项目
ROBUST语音识别方法的研究
  • 批准号:
    69075008
  • 批准年份:
    1990
  • 资助金额:
    3.5 万元
  • 项目类别:
    面上项目
改进型ROBUST序贯检测技术
  • 批准号:
    68671030
  • 批准年份:
    1986
  • 资助金额:
    2.0 万元
  • 项目类别:
    面上项目

相似海外基金

ERI: Towards Robust and Secure Intelligent 3D Sensing Systems
ERI:迈向稳健、安全的智能 3D 传感系统
  • 批准号:
    2347426
  • 财政年份:
    2024
  • 资助金额:
    $ 55万
  • 项目类别:
    Standard Grant
Collaborative Research: SaTC: CORE: Small: Secure and Robust Machine Learning in Multi-Tenant Cloud FPGA
协作研究:SaTC:CORE:小型:多租户云 FPGA 中安全且稳健的机器学习
  • 批准号:
    2411207
  • 财政年份:
    2023
  • 资助金额:
    $ 55万
  • 项目类别:
    Standard Grant
Collaborative Research: SaTC: CORE: Small: Secure and Robust Machine Learning in Multi-Tenant Cloud FPGA
协作研究:SaTC:CORE:小型:多租户云 FPGA 中安全且稳健的机器学习
  • 批准号:
    2153525
  • 财政年份:
    2022
  • 资助金额:
    $ 55万
  • 项目类别:
    Standard Grant
ELECTROMYOGRAPHY FOR A SECURE AND ROBUST DUAL-MODE BIOMETRIC SYSTEM
用于安全、稳健的双模生物识别系统的肌电图
  • 批准号:
    569016-2022
  • 财政年份:
    2022
  • 资助金额:
    $ 55万
  • 项目类别:
    Alexander Graham Bell Canada Graduate Scholarships - Doctoral
Towards Smart Cities: Scalable and Robust Design and Dimensioning of Secure Fog-Computing Infrastructure to Support Latency Sensitive and Dynamic IoT Applications
迈向智慧城市:安全雾计算基础设施的可扩展且稳健的设计和尺寸设计,以支持延迟敏感和动态物联网应用
  • 批准号:
    558695-2021
  • 财政年份:
    2022
  • 资助金额:
    $ 55万
  • 项目类别:
    Postgraduate Scholarships - Doctoral
CAREER: Learning to Secure Cooperative Multi-Agent Learning Systems: Advanced Attacks and Robust Defenses
职业:学习保护协作多代理学习系统:高级攻击和强大的防御
  • 批准号:
    2146548
  • 财政年份:
    2022
  • 资助金额:
    $ 55万
  • 项目类别:
    Continuing Grant
Electronic Medical Records Ecosystem Cloud Transition Investigation, Planning, and Development for Robust, Scalable and Secure Long-Term Support
电子病历生态系统云转型调查、规划和开发,以提供强大、可扩展和安全的长期支持
  • 批准号:
    555774-2020
  • 财政年份:
    2022
  • 资助金额:
    $ 55万
  • 项目类别:
    Applied Research and Development Grants - Level 3
Collaborative Research: SaTC: CORE: Small: Secure and Robust Machine Learning in Multi-Tenant Cloud FPGA
协作研究:SaTC:CORE:小型:多租户云 FPGA 中安全且稳健的机器学习
  • 批准号:
    2153690
  • 财政年份:
    2022
  • 资助金额:
    $ 55万
  • 项目类别:
    Standard Grant
Electronic Medical Records Ecosystem Cloud Transition Investigation, Planning, and Development for Robust, Scalable and Secure Long-Term Support
电子病历生态系统云转型调查、规划和开发,以提供强大、可扩展和安全的长期支持
  • 批准号:
    555774-2020
  • 财政年份:
    2021
  • 资助金额:
    $ 55万
  • 项目类别:
    Applied Research and Development Grants - Level 3
Towards Smart Cities: Scalable and Robust Design and Dimensioning of Secure Fog-Computing Infrastructure to Support Latency Sensitive and Dynamic IoT Applications
迈向智慧城市:安全雾计算基础设施的可扩展且稳健的设计和尺寸设计,以支持延迟敏感和动态物联网应用
  • 批准号:
    558695-2021
  • 财政年份:
    2021
  • 资助金额:
    $ 55万
  • 项目类别:
    Postgraduate Scholarships - Doctoral
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了