III: Small: Automated Event Classification and Decision Making in Massive Data Streams

III:小:海量数据流中的自动事件分类和决策

基本信息

  • 批准号:
    1118041
  • 负责人:
  • 金额:
    $ 50万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2011
  • 资助国家:
    美国
  • 起止时间:
    2011-08-01 至 2014-07-31
  • 项目状态:
    已结题

项目摘要

As the exponential growth of data volumes and complexity continues in all sciences (and indeed all other fields of the modern society, economy, commerce, security, etc.), there is a growing need for powerful new tools and methodologies which can help us extract knowledge and understanding from these massive data sets and data streams. The newly gained knowledge is often used to guide our actions, and in science that typically means follow-up studies and measurements, as the research cycle continues. As the data rates and volume increase, it becomes necessary to take humans out of the loop, and develop automated methods for time-critical knowledge extraction and optimized response to anomalous or interesting events found by the data processing pipelines. This proposal is to develop a system that will be an example of a new generation of scientific experiments and methods that involve real-time mining of massive data streams, and dynamical follow-up strategies. The system would be developed and validated in the context of real scientific situations from the emerging field of time-domain astronomy. A new generation of synoptic sky surveys covers the sky repeatedly, detecting variable or transient phenomena, over a broad range of astrophysics, from the Solar system and stellar evolution, to cosmology and extreme relativistic objects; from extrasolar planets to gamma-ray bursts and supernovae as probes of the dark energy. As we explore the observable parameter space, there is a real possibility of discovery of new types of objects and phenomena. The system will enable exciting new astrophysics, and facilitate discovery. The key to this is a fully automated classification and prioritization of the transient events, and their follow-up observations. This poses some interesting challenges for applied computer science, especially in the area of Machine Learning, including an automated classification where only a sparse, incomplete, and heterogeneous data are available, and contextual information and domain expertise must be folded in the process. The process must be dynamic, incorporating new data as they become available, and revising the classifications accordingly. The system would then generate automatically decisions for an optimal follow-up of the most interesting events, given the available limited assets and resources. This project will aid the entire astronomical community in developing new scientific strategies and procedures in the era of large synoptic sky surveys, facilitate data sharing and re-use, and stimulate further development of Virtual Observatory capabilities. The methods and experiences gained here will be described in the open literature, so that they may find a broader use outside astronomy, wherever similar time-critical situations occur, thus fostering constructive new synergies between applied computer science and other domains. The proposers will train undergraduate and graduate students and postdocs, in the methods of scientific computing and computational thinking, and develop effective EPO materials, touching on both the new science and computation.The challenges posed by the knowledge extraction in the era of data abundance become even sharper in the time-critical situations where we mine the information from massive data streams, especially when the phenomena under study are short-lived, and/or a rapid follow-up reaction is needed. Potentially interesting phenomena and events must be identified, classified, and prioritized in real time, typically using some combination of the new measurements, and existing archival data and models. Then an optimal decision has to be made as to what is the best follow-up that will provide the essential new information in any given individual case; this can be critical if the follow-up assets are scarce or costly. If the time scales are short, and data rates large, the implication is that humans should be taken out of the loop, and that the classification, prioritization, and follow-up decision process must be fully automated. Machine learning (ML) and machine intelligence tools become a necessity. This proposal is to develop a novel, ML-based system for a real-time classification and prioritization of transient events, using the newly emerging field of time-domain astronomy and synoptic sky surveys as a scientific testbed. The classification problem here is different from the usual situations: the data are sparse and/or incomplete, heterogeneous, and evolving as the new measurements come in; the decision process has to take into account the uncertainties of the classification process, and the available assets; and so on. While the sky surveys detect transient cosmic events, the scientific returns come from their directed follow-up. It is essential to be able to classify and prioritize interesting events, especially as we move from the present Terascale data streams and tens of candidate events per night, to the future Petascale data regime, with literally millions of candidates, only a handful of which can be followed. Given the problem of data incompleteness and sparsity, the proposers will explore the use of Bayesian techniques that can operate on a set of expert-developed and ML-based priors, using the currently best available data. Some of the methodological challenges include incorporation of the contextual information and human expertise and optimal combination of separate classifier outputs, as well as new methods developed in this project. All of the algorithmic developments will be done keeping the robustness and scalability in mind, and tested on real scientific use cases.
随着所有科学(甚至现代社会、经济、商业、安全等所有其他领域)数据量和复杂性的指数级增长,对能够帮助我们从这些海量数据集和数据流中提取知识和理解的强大新工具和方法的需求日益增长。新获得的知识经常被用来指导我们的行动,在科学上,这通常意味着随着研究周期的继续,进行后续研究和测量。随着数据率和数据量的增加,有必要让人类走出循环,开发自动化方法来提取时间关键型知识,并优化对数据处理管道发现的异常或感兴趣的事件的响应。这项提议是开发一个系统,它将成为新一代科学实验和方法的例子,这些实验和方法涉及对海量数据流的实时挖掘,以及动态后续战略。该系统将在新兴的时间域天文学领域的真实科学情况下开发和验证。从太阳系和恒星演化到宇宙学和极端相对论天体;从太阳系外的行星到伽马射线爆发和超新星作为暗能量的探测器,新一代天气巡天重复地覆盖天空,探测各种可变或瞬变的现象。当我们探索可观测参数空间时,有可能发现新类型的物体和现象。该系统将使令人兴奋的新天体物理学成为可能,并促进发现。实现这一点的关键是对瞬时事件及其后续观察进行全自动分类和优先排序。这给应用计算机科学带来了一些有趣的挑战,特别是在机器学习领域,包括自动分类,在这种分类中,只有稀疏、不完整和不同种类的数据可用,并且上下文信息和领域专业知识必须在这个过程中折叠。这一过程必须是动态的,在新数据可用时纳入其中,并相应地修订分类。在现有的有限资产和资源的情况下,系统将自动为最感兴趣的事件的最佳后续行动作出决定。该项目将帮助整个天文学界在大型天文观测时代制定新的科学战略和程序,促进数据共享和重复使用,并促进虚拟天文台能力的进一步发展。在这里获得的方法和经验将在开放文献中加以描述,以便它们可以在天文学之外得到更广泛的使用,无论在哪里发生类似的时间紧迫的情况,从而在应用计算机科学和其他领域之间促进建设性的新的协同作用。提出者将对本科生、研究生和博士后进行科学计算和计算思维方法的培训,并开发有效的EPO材料,涉及新的科学和计算。在数据丰富的时代,知识提取带来的挑战在从海量数据流中挖掘信息的时间关键情况下变得更加尖锐,特别是当所研究的现象是短暂的,和/或需要快速的后续反应。必须实时识别、分类潜在的有趣现象和事件,并确定优先顺序,通常使用新测量数据和现有存档数据和模型的某种组合。然后,必须作出最佳决定,确定在任何特定个案中提供基本新信息的最佳后续行动是什么;如果后续行动资产稀缺或成本高昂,这可能是至关重要的。如果时间范围很短,而数据率很大,这意味着应该将人类排除在循环之外,并且分类、优先排序和后续决策过程必须完全自动化。机器学习(ML)和机器智能工具成为必需品。这项提议是开发一个新的、基于ML的系统,用于对瞬变事件进行实时分类和优先排序,使用新出现的时间域天文学和天气天文测量领域作为科学试验台。这里的分类问题不同于通常的情况:数据稀疏和/或不完整、异质,并且随着新的衡量标准的到来而不断变化;决策过程必须考虑分类过程的不确定性和可用的资产;等等。虽然天空勘测探测到了短暂的宇宙事件,但科学回报来自于他们定向的后续行动。重要的是能够对有趣的事件进行分类和优先排序,特别是当我们从目前的Terascale数据流和每晚数十个候选事件转移到未来的Petascale数据制度时,实际上有数百万个候选事件,其中只有一小部分可以跟踪。考虑到数据的不完备性和稀疏性的问题,提出者将探索使用贝叶斯技术,这种技术可以使用目前最好的可用数据,对一组专家开发的、基于ML的先验数据进行操作。一些方法学上的挑战包括纳入背景信息和人类专门知识、单独的分类器输出的最佳组合,以及在该项目中开发的新方法。所有的算法开发都将考虑到健壮性和可伸缩性,并在真实的科学用例上进行测试。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Stanislav Djorgovski其他文献

Stanislav Djorgovski的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Stanislav Djorgovski', 18)}}的其他基金

EarthCube IA: Collaborative Proposal: EarthCube Integration & Test Environment
EarthCube IA:协作提案:EarthCube 集成
  • 批准号:
    1541049
  • 财政年份:
    2015
  • 资助金额:
    $ 50万
  • 项目类别:
    Standard Grant
Collaborative Research: New Insights from a Systematic Approach to Quasar Variability
合作研究:类星体变异性系统方法的新见解
  • 批准号:
    1518308
  • 财政年份:
    2015
  • 资助金额:
    $ 50万
  • 项目类别:
    Standard Grant
EarthCube Conceptual Design: A Scalable Community Driven Architecture
EarthCube 概念设计:可扩展的社区驱动架构
  • 批准号:
    1343661
  • 财政年份:
    2014
  • 资助金额:
    $ 50万
  • 项目类别:
    Standard Grant
Open Exploration of the Time Domain with the Catalina Real-Time Transient Survey
通过 Catalina 实时瞬态勘测对时域进行开放式探索
  • 批准号:
    1413600
  • 财政年份:
    2014
  • 资助金额:
    $ 50万
  • 项目类别:
    Continuing Grant
Open Exploration of the Time Domain with the Catalina Real-Time Transient Survey
通过 Catalina 实时瞬态勘测对时域进行开放式探索
  • 批准号:
    1313422
  • 财政年份:
    2013
  • 资助金额:
    $ 50万
  • 项目类别:
    Standard Grant
The Catalina Real-Time Transient Survey (CRTS)
卡塔利娜实时瞬态调查 (CRTS)
  • 批准号:
    0909182
  • 财政年份:
    2009
  • 资助金额:
    $ 50万
  • 项目类别:
    Standard Grant
HCC:Small:Collaborative Research: Exploring the Use of Immersive Virtual Reality Technologies for Scientific Research, Communication, and Outreach
HCC:Small:协作研究:探索沉浸式虚拟现实技术在科学研究、交流和推广中的应用
  • 批准号:
    0917814
  • 财政年份:
    2009
  • 资助金额:
    $ 50万
  • 项目类别:
    Standard Grant
Collaborative Research: The Palomar-QUEST Survey
合作研究:Palomar-QUEST 调查
  • 批准号:
    0407448
  • 财政年份:
    2004
  • 资助金额:
    $ 50万
  • 项目类别:
    Continuing Grant
Support for the conference 'Virtual Observatories of the Future', June 13-16, 2000, Caltech, Pasadena, CA
支持“未来虚拟天文台”会议,2000 年 6 月 13 日至 16 日,加利福尼亚州帕萨迪纳加州理工学院
  • 批准号:
    0084709
  • 财政年份:
    2000
  • 资助金额:
    $ 50万
  • 项目类别:
    Standard Grant
Presidential Young Investigator (PYI) Award
总统青年研究员(PYI)奖
  • 批准号:
    9157412
  • 财政年份:
    1991
  • 资助金额:
    $ 50万
  • 项目类别:
    Continuing Grant

相似国自然基金

昼夜节律性small RNA在血斑形成时间推断中的法医学应用研究
  • 批准号:
  • 批准年份:
    2024
  • 资助金额:
    0.0 万元
  • 项目类别:
    省市级项目
tRNA-derived small RNA上调YBX1/CCL5通路参与硼替佐米诱导慢性疼痛的机制研究
  • 批准号:
  • 批准年份:
    2022
  • 资助金额:
    10.0 万元
  • 项目类别:
    省市级项目
Small RNA调控I-F型CRISPR-Cas适应性免疫性的应答及分子机制
  • 批准号:
    32000033
  • 批准年份:
    2020
  • 资助金额:
    24.0 万元
  • 项目类别:
    青年科学基金项目
Small RNAs调控解淀粉芽胞杆菌FZB42生防功能的机制研究
  • 批准号:
    31972324
  • 批准年份:
    2019
  • 资助金额:
    58.0 万元
  • 项目类别:
    面上项目
变异链球菌small RNAs连接LuxS密度感应与生物膜形成的机制研究
  • 批准号:
    81900988
  • 批准年份:
    2019
  • 资助金额:
    21.0 万元
  • 项目类别:
    青年科学基金项目
肠道细菌关键small RNAs在克罗恩病发生发展中的功能和作用机制
  • 批准号:
    31870821
  • 批准年份:
    2018
  • 资助金额:
    56.0 万元
  • 项目类别:
    面上项目
基于small RNA 测序技术解析鸽分泌鸽乳的分子机制
  • 批准号:
    31802058
  • 批准年份:
    2018
  • 资助金额:
    26.0 万元
  • 项目类别:
    青年科学基金项目
Small RNA介导的DNA甲基化调控的水稻草矮病毒致病机制
  • 批准号:
    31772128
  • 批准年份:
    2017
  • 资助金额:
    60.0 万元
  • 项目类别:
    面上项目
基于small RNA-seq的针灸治疗桥本甲状腺炎的免疫调控机制研究
  • 批准号:
    81704176
  • 批准年份:
    2017
  • 资助金额:
    20.0 万元
  • 项目类别:
    青年科学基金项目
水稻OsSGS3与OsHEN1调控small RNAs合成及其对抗病性的调节
  • 批准号:
    91640114
  • 批准年份:
    2016
  • 资助金额:
    85.0 万元
  • 项目类别:
    重大研究计划

相似海外基金

Automated per-plot leaf-level imaging and analysis for small plot arable field trials
针对小地块耕地试验的自动每地块叶级成像和分析
  • 批准号:
    10060164
  • 财政年份:
    2023
  • 资助金额:
    $ 50万
  • 项目类别:
    Collaborative R&D
SHF: Small: Modular Automated Verification of Concurrent Data Structures
SHF:小型:并发数据结构的模块化自动验证
  • 批准号:
    2304758
  • 财政年份:
    2023
  • 资助金额:
    $ 50万
  • 项目类别:
    Standard Grant
SaTC: CORE: Small: An Automated Framework for Mitigating Single-Trace Side-Channel Leakage
SaTC:核心:小型:用于减轻单迹侧通道泄漏的自动化框架
  • 批准号:
    2241879
  • 财政年份:
    2023
  • 资助金额:
    $ 50万
  • 项目类别:
    Standard Grant
Small Scale Robotics for Automated Dental Biofilm Theranostics
用于自动化牙科生物膜治疗的小型机器人
  • 批准号:
    10658028
  • 财政年份:
    2023
  • 资助金额:
    $ 50万
  • 项目类别:
SHF: Small: Automated Verification and Synthesis of Input Generators in Property-Based Testing Frameworks
SHF:小型:基于属性的测试框架中输入生成器的自动验证和合成
  • 批准号:
    2321680
  • 财政年份:
    2023
  • 资助金额:
    $ 50万
  • 项目类别:
    Standard Grant
SBIR Phase II: Automated Perception for Robotic Chopsticks Manipulating Small and Large Objects in Constrained Spaces
SBIR 第二阶段:机器人筷子在受限空间中操纵小型和大型物体的自动感知
  • 批准号:
    2321919
  • 财政年份:
    2023
  • 资助金额:
    $ 50万
  • 项目类别:
    Cooperative Agreement
SHF: Small: Automated Unit Test Generation using Large Language Models
SHF:小型:使用大型语言模型自动生成单元测试
  • 批准号:
    2307742
  • 财政年份:
    2023
  • 资助金额:
    $ 50万
  • 项目类别:
    Standard Grant
CNS Core: Small: Automated testing for data- and compute-intensive distributed systems through feedback-based fuzzing
CNS 核心:小型:通过基于反馈的模糊测试对数据和计算密集型分布式系统进行自动测试
  • 批准号:
    2140305
  • 财政年份:
    2022
  • 资助金额:
    $ 50万
  • 项目类别:
    Standard Grant
SHF: Small: Toward Fully Automated Formal Software Verification
SHF:小型:迈向全自动形式软件验证
  • 批准号:
    2210243
  • 财政年份:
    2022
  • 资助金额:
    $ 50万
  • 项目类别:
    Standard Grant
Fully Automated End to End Analysis of Non-small-cell Lung Carcinoma using Deep Learning Techniques
使用深度学习技术对非小细胞肺癌进行全自动端到端分析
  • 批准号:
    570281-2022
  • 财政年份:
    2022
  • 资助金额:
    $ 50万
  • 项目类别:
    Postgraduate Scholarships - Doctoral
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了