Semantic Representations for Interactive Text Mining

交互式文本挖掘的语义表示

基本信息

  • 批准号:
    RGPIN-2020-04834
  • 负责人:
  • 金额:
    $ 2.55万
  • 依托单位:
  • 依托单位国家:
    加拿大
  • 项目类别:
    Discovery Grants Program - Individual
  • 财政年份:
    2021
  • 资助国家:
    加拿大
  • 起止时间:
    2021-01-01 至 2022-12-31
  • 项目状态:
    已结题

项目摘要

Key limitations of today's knowledge workers, whose job involves handling or using information, include (a) the amount of text they have to read and digest, and (b) the amount of time they spend searching for, gathering and organizing information in text form. Examples of text-intensive tasks on specialized corpora include: literature search on a given topic for compilation of a systematic review; high-recall retrieval of patents, court decisions or incident reports in customer service or online communities; search and browsing of electronic medical records or health-related listserver content for tacit knowledge embedded in free text; and annotation of papers with research topics. Examples of informal text such as social media include rumour detection and propagation, dynamic topic detection and tracking, and analysis of interviews in sociology research. Core research problems underlying these use cases include: (1) Semantic retrieval of documents, addressing vocabulary mismatch across related documents; (2) The exploitation of semi-structured knowledge bases, such as Wikipedia, as well as weakly organized domain-specific corpora; (3) Handling the dynamic nature of the text data, including concept drift, and flexibly handling shorter or longer time frames; (4) The need for the human-in-the-loop text mining, to guide the algorithms towards producing relevant results for the individual user. This requires interactive visualizations and algorithms open to user interaction. Semantic relatedness methods have been proposed based on word and document embeddings derived from unsupervised training of various deep network architectures on tasks such as word or sentence prediction in large text corpora. Such embeddings have demonstrated advances to the state of the art on a number of supervised downstream natural language processing tasks. However, a gap exists between semantic text representations based on embeddings, which are dense numeric vectors, and human intuition, whose elicitation requires interactive visual interfaces to involve a non-technical user effectively. The proposed research will aim to fill this gap by focusing on explainable, as opposed to black box, machine learning algorithms and representations. Taking this one step further, we will build on interactivity to achieve explainability, allowing the human to efficiently steer the machine learning towards meaningful results. Overall, we will aim for the next-generation visual text analytics systems that build on the capabilities of modern word, term and document embeddings based on deep networks to capture semantics better than the bag-of-words representations, without losing the intuitive nature of word- and term-based visualizations. The proposed research will be a contribution to the emerging research area of explainable deep networks, specialized to interactive machine learning for supporting knowledge workers.
今天的知识工作者的工作涉及处理或使用信息,他们的主要限制包括(A)他们必须阅读和消化的文本量,以及(B)他们花在搜索、收集和组织文本形式的信息上的时间量。专业语料库的文本密集型任务的例子包括:为汇编系统综述而对某一特定主题的文献搜索;在客户服务或在线社区中检索专利、法院裁决或事件报告的高召回率;搜索和浏览电子医疗记录或与健康有关的清单服务器内容,以寻找自由文本中嵌入的默示知识;以及为带有研究主题的论文加注解。社交媒体等非正式文本的例子包括谣言检测和传播、动态话题检测和跟踪以及社会学研究中的采访分析。这些用例背后的核心研究问题包括:(1)文档的语义检索,解决相关文档之间的词汇不匹配问题;(2)半结构化知识库的开发,如维基百科,以及组织薄弱的特定领域语料库;(3)处理文本数据的动态性质,包括概念漂移,以及灵活地处理较短或较长的时间框架;(4)需要人在循环中进行文本挖掘,以指导算法为个人用户产生相关结果。这需要对用户交互开放的交互式可视化和算法。已经提出了基于单词和文档嵌入的语义关联方法,该方法源于对各种深层网络结构的无监督训练,用于在大型文本语料库中进行单词或句子预测等任务。这样的嵌入已经在许多受监督的下游自然语言处理任务上展示了对技术水平的进步。然而,基于嵌入的语义文本表示和人类直觉之间存在差距,前者是密集的数字向量,后者的启发需要交互的视觉界面来有效地涉及非技术用户。拟议的研究旨在通过专注于可解释的机器学习算法和表示来填补这一空白,而不是黑盒。更进一步,我们将建立在互动性的基础上,实现可解释性,使人类能够有效地引导机器学习获得有意义的结果。总体而言,我们的目标是建立在基于深度网络的现代单词、术语和文档嵌入能力的下一代视觉文本分析系统,以比词袋表示法更好地捕获语义,而不会失去基于单词和术语的可视化的直观本质。拟议的研究将是对新兴的可解释深度网络研究领域的贡献,该领域专门用于支持知识工作者的交互式机器学习。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Milios, Evangelos其他文献

Information retrieval by semantic similarity
Causal graph extraction from news: a comparative study of time-series causality learning techniques.
  • DOI:
    10.7717/peerj-cs.1066
  • 发表时间:
    2022
  • 期刊:
  • 影响因子:
    3.8
  • 作者:
    Maisonnave, Mariano;Delbianco, Fernando;Tohme, Fernando;Milios, Evangelos;Maguitman, Ana G.
  • 通讯作者:
    Maguitman, Ana G.
Improving the performance of focused web crawlers
  • DOI:
    10.1016/j.datak.2009.04.002
  • 发表时间:
    2009-10-01
  • 期刊:
  • 影响因子:
    2.5
  • 作者:
    Batsakis, Sotiris;Petrakis, Euripides G. M.;Milios, Evangelos
  • 通讯作者:
    Milios, Evangelos
Statistical learning for OCR error correction
  • DOI:
    10.1016/j.ipm.2018.06.001
  • 发表时间:
    2018-11-01
  • 期刊:
  • 影响因子:
    8.6
  • 作者:
    Mei, Jie;Islam, Aminul;Milios, Evangelos
  • 通讯作者:
    Milios, Evangelos
Topic-based web site summarization

Milios, Evangelos的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Milios, Evangelos', 18)}}的其他基金

Semantic Representations for Interactive Text Mining
交互式文本挖掘的语义表示
  • 批准号:
    RGPIN-2020-04834
  • 财政年份:
    2022
  • 资助金额:
    $ 2.55万
  • 项目类别:
    Discovery Grants Program - Individual
How is Canadians' mental health affected by COVID-19: visual analytics of social media text
COVID-19 对加拿大人的心理健康有何影响:社交媒体文本的可视化分析
  • 批准号:
    554657-2020
  • 财政年份:
    2020
  • 资助金额:
    $ 2.55万
  • 项目类别:
    Alliance Grants
Semantic Representations for Interactive Text Mining
交互式文本挖掘的语义表示
  • 批准号:
    RGPIN-2020-04834
  • 财政年份:
    2020
  • 资助金额:
    $ 2.55万
  • 项目类别:
    Discovery Grants Program - Individual
Exploiting Semantic Analysis of Documents
利用文档语义分析
  • 批准号:
    RGPIN-2015-06183
  • 财政年份:
    2019
  • 资助金额:
    $ 2.55万
  • 项目类别:
    Discovery Grants Program - Individual
Semantic search using deep networks****
使用深度网络进行语义搜索****
  • 批准号:
    531051-2018
  • 财政年份:
    2018
  • 资助金额:
    $ 2.55万
  • 项目类别:
    Engage Grants Program
Exploiting Semantic Analysis of Documents
利用文档语义分析
  • 批准号:
    RGPIN-2015-06183
  • 财政年份:
    2018
  • 资助金额:
    $ 2.55万
  • 项目类别:
    Discovery Grants Program - Individual
Visual text analytics for total recall information retrieval in large noisy text datasets
用于大型噪声文本数据集中的总召回信息检索的视觉文本分析
  • 批准号:
    499941-2016
  • 财政年份:
    2017
  • 资助金额:
    $ 2.55万
  • 项目类别:
    Collaborative Research and Development Grants
Exploiting Semantic Analysis of Documents
利用文档语义分析
  • 批准号:
    RGPIN-2015-06183
  • 财政年份:
    2017
  • 资助金额:
    $ 2.55万
  • 项目类别:
    Discovery Grants Program - Individual
Trajectory-based localization using WiFi signal strength
使用 WiFi 信号强度进行基于轨迹的定位
  • 批准号:
    507295-2016
  • 财政年份:
    2016
  • 资助金额:
    $ 2.55万
  • 项目类别:
    Engage Grants Program
Automation and Evaluation of Business Intelligence
商业智能的自动化和评估
  • 批准号:
    492547-2015
  • 财政年份:
    2016
  • 资助金额:
    $ 2.55万
  • 项目类别:
    Engage Grants Program

相似海外基金

Promoting Students' Data Literacy through the Creation of Interactive Multimodal Representations of Biometric Data
通过创建生物识别数据的交互式多模态表示来提高学生的数据素养
  • 批准号:
    2241751
  • 财政年份:
    2023
  • 资助金额:
    $ 2.55万
  • 项目类别:
    Standard Grant
Semantic Representations for Interactive Text Mining
交互式文本挖掘的语义表示
  • 批准号:
    RGPIN-2020-04834
  • 财政年份:
    2022
  • 资助金额:
    $ 2.55万
  • 项目类别:
    Discovery Grants Program - Individual
Semantic Representations for Interactive Text Mining
交互式文本挖掘的语义表示
  • 批准号:
    RGPIN-2020-04834
  • 财政年份:
    2020
  • 资助金额:
    $ 2.55万
  • 项目类别:
    Discovery Grants Program - Individual
Mathematical Foundations of Multiscale Graph Representations and Interactive Learning
多尺度图表示和交互式学习的数学基础
  • 批准号:
    0808847
  • 财政年份:
    2008
  • 资助金额:
    $ 2.55万
  • 项目类别:
    Standard Grant
The use of graphical representations to scaffold collaborative argumentation around an interactive whiteboard in Key Stage 2 science
在关键阶段 2 科学中使用图形表示来围绕交互式白板进行协作论证
  • 批准号:
    ES/F017731/1
  • 财政年份:
    2008
  • 资助金额:
    $ 2.55万
  • 项目类别:
    Research Grant
HCC: Interactive and Enriched Haptic Graphical Representations for People who are Blind and Visually Impaired
HCC:为盲人和视障人士提供丰富的交互式触觉图形表示
  • 批准号:
    0712936
  • 财政年份:
    2007
  • 资助金额:
    $ 2.55万
  • 项目类别:
    Continuing Grant
Central representations of interactive tasks
交互任务的中心表示
  • 批准号:
    184016-1996
  • 财政年份:
    1999
  • 资助金额:
    $ 2.55万
  • 项目类别:
    Discovery Grants Program - Individual
Algorithms and Geometry Representations for Interactive Visualization and Modeling
用于交互式可视化和建模的算法和几何表示
  • 批准号:
    9978147
  • 财政年份:
    1999
  • 资助金额:
    $ 2.55万
  • 项目类别:
    Standard Grant
Central representations of interactive tasks
交互任务的中心表示
  • 批准号:
    184016-1996
  • 财政年份:
    1998
  • 资助金额:
    $ 2.55万
  • 项目类别:
    Discovery Grants Program - Individual
Central representations of interactive tasks
交互任务的中心表示
  • 批准号:
    184016-1996
  • 财政年份:
    1997
  • 资助金额:
    $ 2.55万
  • 项目类别:
    Discovery Grants Program - Individual
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了