CROSS: Real-time Story Detection Across Multiple Massive Streams

CROSS:跨多个海量流的实时故事检测

基本信息

  • 批准号:
    EP/J020664/1
  • 负责人:
  • 金额:
    $ 26.72万
  • 依托单位:
  • 依托单位国家:
    英国
  • 项目类别:
    Research Grant
  • 财政年份:
    2012
  • 资助国家:
    英国
  • 起止时间:
    2012 至 无数据
  • 项目状态:
    已结题

项目摘要

The World is rapidly becoming more and more connected, with people communicating using multiple streams - Social Media, Newswire, Wikipedia etc - on a bewildering range of topics and at a furious rate. Twitter alone receives more than 250 million new posts every day (Tsotsis 2011). This massive interconnection means that content can appear and quickly spread through and across different streams. For example, in the recent London riots, many tweets reported the rioting events as they happened in real-time. However, not all content posted is either of good quality or is factually correct, complicating the job of monitoring such streams for any purpose. An example of this happened when a comedian spread false rumours on Twitter about Osama Bin Laden watching his television show (Lineham 2011). Communication streams are also known to spread rumours, outright misinformation and content with malicious intent. For instance, during the same riots, radicalising posts were spread calling for participation in the so-called "cyber-jihad" (BBC 2011). Systems that can identify such posts is of paramount importance for security monitoring purposes.On the other hand, not all information spread on mediums such as Twitter are accurate or interesting. This is compounded with the peculiarities of messages on modern social media (short, jargon, social context, etc.) where biased, incomplete, inaccurate and misleading messages are common. The latter makes it extremely challenging to automatically identify events worth monitoring for security purposes in real-time.We propose a distributed infrastructure to automatically identify important new events (aka stories) in real-time by combining and comparing multiple message streams. The value of such story detection to many applications is clearly increased the faster this can happen. A security agency using our system would be better prepared when dealing with fast moving events as they unfold. Indeed, in this project, the notion of importance will be defined within a security context. Given the fact that streams typically have possible bias and not everything present can be trusted, a key requirement of the system is minimising false positives (uninteresting stories that are discovered). Moreover, the effective management and efficient processing of multiple streams of real-time data poses new technological and scientific challenges:Challenge 1: Identify interesting new stories and not drown in a sea of false positives, yet reduce the effects of bias and rumour.Challenge 2: Minimise system latency, such that new stories are detected in real-time with low latency.We tackle the first challenge from the novel perspective of processing multiple streams and exploiting the fact that stories reported multiple times across several streams can cancel-out stream-specific bias and errors. For example, if a story is true, then it is more likely that it manifests in both Twitter and as an update to a Wikipedia article. Alternatively, a story might appear in Twitter and also appear in a governmental cable. The more often a story occurs within and across streams, the more likely it will be interesting. This is the cornerstone of our proposal, which we tackle by building upon modern first story detection techniques, adapted to account for bias and rumours.In the second challenge, we ensure low-latency story detection by using a distributed real-time data processing architecture (e.g. S4 or Storm), similar to MapReduce but better suited for real-time operations. Real-time architectures for dealing with massive-scale data are in their infancy, hence CROSS will present a first concrete application, with a corresponding development of best practices for such architectures.
世界正迅速变得越来越紧密,人们使用多种信息流-社交媒体,新闻网,维基百科等-就令人眼花缭乱的主题进行交流,并且速度非常快。仅Twitter每天就收到超过2.5亿条新帖子(Tsotsis 2011)。这种大规模的互联意味着内容可以通过不同的流出现并快速传播。例如,在最近的伦敦骚乱中,许多推文实时报道了骚乱事件。然而,并不是所有发布的内容都是高质量的或事实上是正确的,这使得出于任何目的监控此类流的工作变得复杂。这方面的一个例子是,一位喜剧演员在Twitter上散布关于奥萨马·本·拉登观看他的电视节目的谣言(Lineham 2011)。通信流也被称为传播谣言,彻头彻尾的错误信息和恶意内容。例如,在同一次骚乱中,传播了激进的帖子,呼吁参加所谓的“网络圣战”(BBC 2011年)。能够识别这些帖子的系统对于安全监控来说至关重要。另一方面,并非所有在Twitter等媒体上传播的信息都是准确或有趣的。这与现代社交媒体上信息的特殊性(简短,行话,社会背景等)相结合。在这些地方,偏见、不完整、不准确和误导性的信息很常见。后者使得它非常具有挑战性的自动识别事件值得监控的安全目的在real-time.We提出了一个分布式的基础设施,自动识别重要的新事件(又名故事)的实时组合和比较多个消息流。这种故事检测对许多应用程序的价值明显增加,这可以发生得越快。使用我们系统的安全机构在处理快速发展的事件时会做好更好的准备。事实上,在这个项目中,重要性的概念将在安全背景下定义。鉴于流通常具有可能的偏差并且并非所有存在的内容都可以信任的事实,系统的关键要求是最大限度地减少误报(发现的无趣故事)。此外,对多个实时数据流的有效管理和高效处理带来了新的技术和科学挑战:挑战1:识别有趣的新故事,而不是淹没在误报的海洋中,同时减少偏见和谣言的影响。挑战2:最小化系统延迟,使得新的故事在真实的中被检测到-我们从处理多个流的新颖角度来解决第一个挑战,并利用故事在多个流中多次报告的事实。流可以抵消流特定的偏差和误差。例如,如果一个故事是真实的,那么它更有可能同时出现在Twitter和维基百科文章的更新中。或者,一个故事可能出现在Twitter上,也可能出现在政府电报中。一个故事越经常发生在内部和跨流,它就越有可能有趣。这是我们方案的基石,我们通过构建现代的第一故事检测技术来解决这一问题,该技术适用于解释偏见和谣言。在第二个挑战中,我们通过使用分布式实时数据处理架构(例如S4或Storm)来确保低延迟的故事检测,该架构类似于MapReduce,但更适合实时操作。处理大规模数据的实时架构尚处于起步阶段,因此CROSS将提出第一个具体的应用程序,并相应地开发此类架构的最佳实践。

项目成果

期刊论文数量(3)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Bieber no more : First Story Detection using Twitter and Wikipedia
  • DOI:
  • 发表时间:
    2012
  • 期刊:
  • 影响因子:
    0
  • 作者:
    M. Osborne;S. Petrovic;R. McCreadie;Craig Macdonald;I. Ounis;S. Petrovic
  • 通讯作者:
    M. Osborne;S. Petrovic;R. McCreadie;Craig Macdonald;I. Ounis;S. Petrovic
Can Twitter Replace Newswire for Breaking News?
Incorporating Social Role Theory into Topic Models for Social Media Content Analysis
  • DOI:
    10.1109/tkde.2014.2359672
  • 发表时间:
    2015-04
  • 期刊:
  • 影响因子:
    8.9
  • 作者:
    Wayne Xin Zhao;Jinpeng Wang;Yulan He;Jian-Yun Nie;Ji-Rong Wen;Xiaoming Li
  • 通讯作者:
    Wayne Xin Zhao;Jinpeng Wang;Yulan He;Jian-Yun Nie;Ji-Rong Wen;Xiaoming Li
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

M Osborne其他文献

M Osborne的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('M Osborne', 18)}}的其他基金

ReDites: Real Time, Detection, Tracking, Monitoring and Interpretation of Events in Social Media
ReDites:社交媒体事件的实时、检测、跟踪、监控和解释
  • 批准号:
    EP/L010690/1
  • 财政年份:
    2013
  • 资助金额:
    $ 26.72万
  • 项目类别:
    Research Grant
Discriminative Phrase-Based Statistical Machine Translation
基于判别性短语的统计机器翻译
  • 批准号:
    EP/D074959/1
  • 财政年份:
    2007
  • 资助金额:
    $ 26.72万
  • 项目类别:
    Research Grant

相似国自然基金

Immuno-Real Time PCR法精确定量血清MG7抗原及在早期胃癌预警中的价值
  • 批准号:
    30600737
  • 批准年份:
    2006
  • 资助金额:
    22.0 万元
  • 项目类别:
    青年科学基金项目
无色ReAl3(BO3)4(Re=Y,Lu)系列晶体紫外倍频性能与器件研究
  • 批准号:
    60608018
  • 批准年份:
    2006
  • 资助金额:
    28.0 万元
  • 项目类别:
    青年科学基金项目

相似海外基金

An implantable biosensor microsystem for real-time measurement of circulating biomarkers
用于实时测量循环生物标志物的植入式生物传感器微系统
  • 批准号:
    2901954
  • 财政年份:
    2028
  • 资助金额:
    $ 26.72万
  • 项目类别:
    Studentship
CAREER: Real-Time First-Principles Approach to Understanding Many-Body Effects on High Harmonic Generation in Solids
职业:实时第一性原理方法来理解固体高次谐波产生的多体效应
  • 批准号:
    2337987
  • 财政年份:
    2024
  • 资助金额:
    $ 26.72万
  • 项目类别:
    Continuing Grant
CAREER: Secure Miniaturized Bio-Electronic Sensors for Real-Time In-Body Monitoring
职业:用于实时体内监测的安全微型生物电子传感器
  • 批准号:
    2338792
  • 财政年份:
    2024
  • 资助金额:
    $ 26.72万
  • 项目类别:
    Continuing Grant
PZT-hydrogel integrated active non-Hermitian complementary acoustic metamaterials with real time modulations through feedback control circuits
PZT-水凝胶集成有源非厄米互补声学超材料,通过反馈控制电路进行实时调制
  • 批准号:
    2423820
  • 财政年份:
    2024
  • 资助金额:
    $ 26.72万
  • 项目类别:
    Standard Grant
CAREER: Towards Safety-Critical Real-Time Systems with Learning Components
职业:迈向具有学习组件的安全关键实时系统
  • 批准号:
    2340171
  • 财政年份:
    2024
  • 资助金额:
    $ 26.72万
  • 项目类别:
    Continuing Grant
CSR: Small: Multi-FPGA System for Real-time Fraud Detection with Large-scale Dynamic Graphs
CSR:小型:利用大规模动态图进行实时欺诈检测的多 FPGA 系统
  • 批准号:
    2317251
  • 财政年份:
    2024
  • 资助金额:
    $ 26.72万
  • 项目类别:
    Standard Grant
HoloSurge: Multimodal 3D Holographic tool and real-time Guidance System with point-of-care diagnostics for surgical planning and interventions on liver and pancreatic cancers
HoloSurge:多模态 3D 全息工具和实时指导系统,具有护理点诊断功能,可用于肝癌和胰腺癌的手术规划和干预
  • 批准号:
    10103131
  • 财政年份:
    2024
  • 资助金额:
    $ 26.72万
  • 项目类别:
    EU-Funded
LTREB: Integrating real-time open data pipelines and forecasting to quantify ecosystem predictability at day to decadal scales
LTREB:集成实时开放数据管道和预测,以量化每日到十年尺度的生态系统可预测性
  • 批准号:
    2327030
  • 财政年份:
    2024
  • 资助金额:
    $ 26.72万
  • 项目类别:
    Continuing Grant
CAREER: Personalized, wearable robot mobility assistance considering human-robot co-adaptation that incorporates biofeedback, user coaching, and real-time optimization
职业:个性化、可穿戴机器人移动辅助,考虑人机协同适应,结合生物反馈、用户指导和实时优化
  • 批准号:
    2340519
  • 财政年份:
    2024
  • 资助金额:
    $ 26.72万
  • 项目类别:
    Continuing Grant
CAREER: SHF: Bio-Inspired Microsystems for Energy-Efficient Real-Time Sensing, Decision, and Adaptation
职业:SHF:用于节能实时传感、决策和适应的仿生微系统
  • 批准号:
    2340799
  • 财政年份:
    2024
  • 资助金额:
    $ 26.72万
  • 项目类别:
    Continuing Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了