BIGDATA: Collaborative Research: F: Streaming Architecture for Continuous Entity Linking in Social Media

BIGDATA:协作研究:F:社交媒体中连续实体链接的流架构

基本信息

  • 批准号:
    1546480
  • 负责人:
  • 金额:
    $ 78.33万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2016
  • 资助国家:
    美国
  • 起止时间:
    2016-01-01 至 2020-12-31
  • 项目状态:
    已结题

项目摘要

A large fraction of the ever-growing internet content is found in social media such as (micro)blogs. Users access it to both form and share their opinions about events and people, election preferences, product and brand recommendations. This situation provides opportunities to create added layers of data mining and analysis regarding users' views on developing events, products, services, or government actions; at the same time, it raises challenges for Entity Linking (EL) in social media. EL is the task of linking an extracted mention to a specific definition of the entity. The definition of an entity is usually a pointer to a Web page that defines the entity. Information extraction from social media generally faces many challenging issues due to: message volume, message speed (Twitter alone generates over 500 million messages per day), variety, free-form language, lack of context, large reference variation and language diversity. Hashtags are an essential part of the ethos of social networks. They are used to denote brands, events, people, social rallies, etc. The hashtag disambiguation problem is to detect synonymous hashtags and recognize the polysemic ones. For example, the hashtag '#BHaram' refers to the entity 'Boko Haram', defined at Wikipedia page en.wikipedia.org/wiki/Boko_Haram or at National Counterterrorism Center Web web page www.nctc.gov/site/groups/boko_haram.html. The purpose of this project is to perform EL in social media. This work will benefit multiple segments of society that rely on applications using data from microblog systems, such as targeted monitoring of Twitter and Facebook to collect and understand users' opinions about a recent product or a world event; data aggregation (e.g., reviews about products and services); and data mining for early crisis detection and response as well as national security. This project is one more step towards addressing the government's latest initiative of fighting crime using big data.The goals of this project are to research algorithms to detect in near real-time those pieces of text in messages that reference entities, Web pages that describe entities, and to link entity references to Web pages and across microblog systems so that together a broad, more complete characterization of each entity can be automatically generated. The proposed approaches are based on innovative techniques that include: incremental, iterative message analysis; smart indexing techniques with live updates to support fast incremental entity reference detection; computationally light soft-clustering of messages to improve entity reference detection; and fast incremental K-partite graph clustering. The resulting artifacts (e.g., software tools) will be made available to benefit researchers in academe and industry. Distribution of free, open-source software for implementing the techniques developed will enhance existing research infrastructure. The project will support and train at least three PhD students, as well as involve undergraduate students in research at Temple University and Binghampton University. The project web site (http://cis.temple.edu/~edragut/projects/nimel.htm) includes more information on the project, software, datasets, educational materials, and publications.
在不断增长的互联网内容中,很大一部分是在社交媒体上发现的,比如(微博)博客。用户访问它的形式和分享他们对事件和人的意见,选举偏好,产品和品牌的建议。这种情况提供了机会,以创建额外的数据挖掘和分析层,用户对开发事件,产品,服务或政府行动的看法;同时,它也为社交媒体中的实体链接(EL)提出了挑战。EL是将提取的提及链接到实体的特定定义的任务。实体的定义通常是指向定义该实体的Web页面的指针。从社交媒体中提取信息通常面临许多具有挑战性的问题,这是由于:消息量,消息速度(仅Twitter每天就产生超过5亿条消息),多样性,自由形式的语言,缺乏上下文,大量的参考变化和语言多样性。标签是社交网络的重要组成部分。它们用于表示品牌,事件,人物,社交集会等。主题标签消歧问题是检测同义主题标签并识别多义主题标签。例如,标签“#BHaram”指的是在维基百科页面en.wikipedia.org/wiki/Boko_Haram或国家反恐中心网页www.nctc.gov/site/groups/boko_haram.html上定义的实体“博科圣地”。这个项目的目的是在社交媒体上进行EL。这项工作将使依赖使用来自微博系统的数据的应用程序的多个社会阶层受益,例如有针对性地监测Twitter和Facebook,以收集和了解用户对最近产品或世界事件的意见;数据聚合(例如,有关产品和服务的评论);以及用于早期危机检测和响应以及国家安全的数据挖掘。该项目是政府利用大数据打击犯罪的最新举措的又一步。该项目的目标是研究算法,以近实时地检测引用实体的消息中的文本片段,描述实体的网页,并将实体引用链接到网页和微博系统,以便共同建立广泛的,可以自动生成每个实体的更完整的表征。所提出的方法是基于创新的技术,包括:增量,迭代消息分析;智能索引技术与实时更新,以支持快速增量实体引用检测;计算轻软聚类的消息,以提高实体引用检测;和快速增量K-部图聚类。所得到的伪像(例如,软件工具)将提供给学术界和工业界的研究人员。分发用于实施所开发技术的免费开放源码软件将加强现有的研究基础设施。该项目将支持和培训至少三名博士生,并让本科生参与坦普尔大学和宾汉普顿大学的研究。该项目的网站(http://cis.temple.edu/projects/nimel.htm)包括关于该项目、软件、数据集、教育材料和出版物的更多信息。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Eduard Dragut其他文献

Eduard Dragut的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Eduard Dragut', 18)}}的其他基金

Proto-OKN Theme 1: Knowledge Graph to Support Evaluation and Development of Climate Models
Proto-OKN 主题 1:支持气候模型评估和开发的知识图
  • 批准号:
    2333789
  • 财政年份:
    2023
  • 资助金额:
    $ 78.33万
  • 项目类别:
    Cooperative Agreement
NSF Convergence Accelerator Track F: America's Fourth Estate at Risk: A System for Mapping the (Local) Journalism Life Cycle to Rebuild the Nation's News Trust
NSF 融合加速器轨道 F:美国第四产业面临风险:绘制(本地)新闻生命周期图以重建国家新闻信任的系统
  • 批准号:
    2137846
  • 财政年份:
    2021
  • 资助金额:
    $ 78.33万
  • 项目类别:
    Standard Grant
III: Medium: Collaborative Research: Extracting and Linking AI Artifacts
III:媒介:协作研究:提取和链接人工智能工件
  • 批准号:
    2107213
  • 财政年份:
    2021
  • 资助金额:
    $ 78.33万
  • 项目类别:
    Continuing Grant
BIGDATA: F: Collaborative Research: Collective Mining of Vertical Social Communities
BIGDATA:F:协同研究:垂直社交社区的集体挖掘
  • 批准号:
    1838145
  • 财政年份:
    2018
  • 资助金额:
    $ 78.33万
  • 项目类别:
    Standard Grant

相似海外基金

BIGDATA: IA: Collaborative Research: Asynchronous Distributed Machine Learning Framework for Multi-Site Collaborative Brain Big Data Mining
BIGDATA:IA:协作研究:用于多站点协作大脑大数据挖掘的异步分布式机器学习框架
  • 批准号:
    2348159
  • 财政年份:
    2023
  • 资助金额:
    $ 78.33万
  • 项目类别:
    Standard Grant
BIGDATA: IA: Collaborative Research: Intelligent Solutions for Navigating Big Data from the Arctic and Antarctic
BIGDATA:IA:协作研究:导航北极和南极大数据的智能解决方案
  • 批准号:
    2308649
  • 财政年份:
    2022
  • 资助金额:
    $ 78.33万
  • 项目类别:
    Standard Grant
BIGDATA: Collaborative Research: F: Holistic Optimization of Data-Driven Applications
BIGDATA:协作研究:F:数据驱动应用程序的整体优化
  • 批准号:
    2027516
  • 财政年份:
    2020
  • 资助金额:
    $ 78.33万
  • 项目类别:
    Standard Grant
BIGDATA: F: Collaborative Research: Practical Analysis of Large-Scale Data with Lyme Disease Case Study
BIGDATA:F:协作研究:莱姆病案例研究大规模数据的实际分析
  • 批准号:
    1934319
  • 财政年份:
    2019
  • 资助金额:
    $ 78.33万
  • 项目类别:
    Standard Grant
BIGDATA: IA: Collaborative Research: Protecting Yourself from Wildfire Smoke: Big Data-Driven Adaptive Air Quality Prediction Methodologies
大数据:IA:协作研究:保护自己免受野火烟雾的侵害:大数据驱动的自适应空气质量预测方法
  • 批准号:
    1838022
  • 财政年份:
    2019
  • 资助金额:
    $ 78.33万
  • 项目类别:
    Standard Grant
BIGDATA: F: Collaborative Research: Foundations of Responsible Data Management
大数据:F:协作研究:负责任的数据管理的基础
  • 批准号:
    1926250
  • 财政年份:
    2019
  • 资助金额:
    $ 78.33万
  • 项目类别:
    Standard Grant
BIGDATA: IA: Collaborative Research: Intelligent Solutions for Navigating Big Data from the Arctic and Antarctic
BIGDATA:IA:协作研究:导航北极和南极大数据的智能解决方案
  • 批准号:
    1947584
  • 财政年份:
    2019
  • 资助金额:
    $ 78.33万
  • 项目类别:
    Standard Grant
BIGDATA: IA: Collaborative Research: Asynchronous Distributed Machine Learning Framework for Multi-Site Collaborative Brain Big Data Mining
BIGDATA:IA:协作研究:用于多站点协作大脑大数据挖掘的异步分布式机器学习框架
  • 批准号:
    1837964
  • 财政年份:
    2019
  • 资助金额:
    $ 78.33万
  • 项目类别:
    Standard Grant
BIGDATA: F: Collaborative Research: Optimizing Log-Structured-Merge-Based Big Data Management Systems
BIGDATA:F:协作研究:优化基于日志结构合并的大数据管理系统
  • 批准号:
    1838222
  • 财政年份:
    2019
  • 资助金额:
    $ 78.33万
  • 项目类别:
    Standard Grant
BIGDATA: F: Collaborative Research: Optimizing Log-Structured-Merge-Based Big Data Management Systems
BIGDATA:F:协作研究:优化基于日志结构合并的大数据管理系统
  • 批准号:
    1838248
  • 财政年份:
    2019
  • 资助金额:
    $ 78.33万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了