III: Small: Automated Event Classification and Decision Making in Massive Data Streams

III:小:海量数据流中的自动事件分类和决策

基本信息

  • 批准号:
    1118041
  • 负责人:
  • 金额:
    $ 50万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2011
  • 资助国家:
    美国
  • 起止时间:
    2011-08-01 至 2014-07-31
  • 项目状态:
    已结题

项目摘要

As the exponential growth of data volumes and complexity continues in all sciences (and indeed all other fields of the modern society, economy, commerce, security, etc.), there is a growing need for powerful new tools and methodologies which can help us extract knowledge and understanding from these massive data sets and data streams. The newly gained knowledge is often used to guide our actions, and in science that typically means follow-up studies and measurements, as the research cycle continues. As the data rates and volume increase, it becomes necessary to take humans out of the loop, and develop automated methods for time-critical knowledge extraction and optimized response to anomalous or interesting events found by the data processing pipelines. This proposal is to develop a system that will be an example of a new generation of scientific experiments and methods that involve real-time mining of massive data streams, and dynamical follow-up strategies. The system would be developed and validated in the context of real scientific situations from the emerging field of time-domain astronomy. A new generation of synoptic sky surveys covers the sky repeatedly, detecting variable or transient phenomena, over a broad range of astrophysics, from the Solar system and stellar evolution, to cosmology and extreme relativistic objects; from extrasolar planets to gamma-ray bursts and supernovae as probes of the dark energy. As we explore the observable parameter space, there is a real possibility of discovery of new types of objects and phenomena. The system will enable exciting new astrophysics, and facilitate discovery. The key to this is a fully automated classification and prioritization of the transient events, and their follow-up observations. This poses some interesting challenges for applied computer science, especially in the area of Machine Learning, including an automated classification where only a sparse, incomplete, and heterogeneous data are available, and contextual information and domain expertise must be folded in the process. The process must be dynamic, incorporating new data as they become available, and revising the classifications accordingly. The system would then generate automatically decisions for an optimal follow-up of the most interesting events, given the available limited assets and resources. This project will aid the entire astronomical community in developing new scientific strategies and procedures in the era of large synoptic sky surveys, facilitate data sharing and re-use, and stimulate further development of Virtual Observatory capabilities. The methods and experiences gained here will be described in the open literature, so that they may find a broader use outside astronomy, wherever similar time-critical situations occur, thus fostering constructive new synergies between applied computer science and other domains. The proposers will train undergraduate and graduate students and postdocs, in the methods of scientific computing and computational thinking, and develop effective EPO materials, touching on both the new science and computation.The challenges posed by the knowledge extraction in the era of data abundance become even sharper in the time-critical situations where we mine the information from massive data streams, especially when the phenomena under study are short-lived, and/or a rapid follow-up reaction is needed. Potentially interesting phenomena and events must be identified, classified, and prioritized in real time, typically using some combination of the new measurements, and existing archival data and models. Then an optimal decision has to be made as to what is the best follow-up that will provide the essential new information in any given individual case; this can be critical if the follow-up assets are scarce or costly. If the time scales are short, and data rates large, the implication is that humans should be taken out of the loop, and that the classification, prioritization, and follow-up decision process must be fully automated. Machine learning (ML) and machine intelligence tools become a necessity. This proposal is to develop a novel, ML-based system for a real-time classification and prioritization of transient events, using the newly emerging field of time-domain astronomy and synoptic sky surveys as a scientific testbed. The classification problem here is different from the usual situations: the data are sparse and/or incomplete, heterogeneous, and evolving as the new measurements come in; the decision process has to take into account the uncertainties of the classification process, and the available assets; and so on. While the sky surveys detect transient cosmic events, the scientific returns come from their directed follow-up. It is essential to be able to classify and prioritize interesting events, especially as we move from the present Terascale data streams and tens of candidate events per night, to the future Petascale data regime, with literally millions of candidates, only a handful of which can be followed. Given the problem of data incompleteness and sparsity, the proposers will explore the use of Bayesian techniques that can operate on a set of expert-developed and ML-based priors, using the currently best available data. Some of the methodological challenges include incorporation of the contextual information and human expertise and optimal combination of separate classifier outputs, as well as new methods developed in this project. All of the algorithmic developments will be done keeping the robustness and scalability in mind, and tested on real scientific use cases.
随着所有科学(以及现代社会、经济、商业、安全等所有其他领域)的数据量和复杂性持续呈指数级增长,人们越来越需要强大的新工具和方法来帮助我们从这些海量数据集和数据流中提取知识和理解。 新获得的知识通常用于指导我们的行动,在科学中,这通常意味着随着研究周期的继续进行后续研究和测量。 随着数据速率和数据量的增加,有必要将人类排除在外,并开发自动化方法来提取时间关键的知识,并优化对数据处理管道发现的异常或有趣事件的响应。 该提案旨在开发一个系统,该系统将成为新一代科学实验和方法的示例,涉及海量数据流的实时挖掘和动态后续策略。 该系统将在新兴时域天文学领域的真实科学情况下进行开发和验证。 新一代的天气巡天观测反复覆盖天空,探测各种天体物理学的可变或瞬态现象,从太阳系和恒星演化到宇宙学和极端相对论天体;从太阳系外行星到伽马射线爆发和超新星作为暗能量的探测器。 当我们探索可观察的参数空间时,确实有可能发现新类型的物体和现象。 该系统将实现令人兴奋的新天体物理学,并促进发现。 其关键是对瞬态事件及其后续观察进行全自动分类和优先级排序。 这给应用计算机科学带来了一些有趣的挑战,特别是在机器学习领域,包括仅提供稀疏、不完整和异构数据的自动分类,并且在此过程中必须折叠上下文信息和领域专业知识。 该过程必须是动态的,在新数据可用时将其纳入其中,并相应地修改分类。 然后,在可用的有限资产和资源的情况下,系统将自动生成决策,以便对最有趣的事件进行最佳跟进。 该项目将帮助整个天文学界在大型天气巡天时代制定新的科学战略和程序,促进数据共享和重用,并刺激虚拟天文台能力的进一步发展。 这里获得的方法和经验将在公开文献中描述,以便它们可以在天文学之外找到更广泛的用途,只要发生类似的时间紧迫的情况,从而促进应用计算机科学和其他领域之间建设性的新协同作用。 提议者将在科学计算和计算思维方法方面对本科生、研究生和博士后进行培训,并开发有效的 EPO 材料,涉及新科学和计算。在我们从海量数据流中挖掘信息的时间紧迫的情况下,数据丰富时代知识提取所带来的挑战变得更加尖锐,特别是当所研究的现象是短暂的和/或快速的后续反应时。 需要。 必须实时识别、分类和确定潜在有趣的现象和事件的优先级,通常使用新测量值以及现有档案数据和模型的某种组合。 然后,必须做出最佳决策,决定什么是最好的后续行动,以便在任何给定的个案中提供必要的新信息;如果后续资产稀缺或成本高昂,这一点可能至关重要。 如果时间尺度很短,数据速率很大,则意味着应该将人类排除在循环之外,并且分类、优先级划分和后续决策过程必须完全自动化。 机器学习 (ML) 和机器智能工具成为必需品。 该提案旨在开发一种基于机器学习的新颖系统,利用新兴的时域天文学和天气巡天领域作为科学测试平台,对瞬态事件进行实时分类和优先级排序。这里的分类问题与通常的情况不同:数据稀疏和/或不完整、异构,并且随着新测量的出现而不断变化;决策过程必须考虑分类过程的不确定性和可用资产;等等。 虽然天空调查探测到短暂的宇宙事件,但科学回报来自他们的直接后续行动。 能够对感兴趣的事件进行分类和优先排序至关重要,特别是当我们从当前的万亿级数据流和每晚数十个候选事件转向未来的千万亿级数据制度时,实际上有数百万个候选事件,但其中只有少数事件可以被跟踪。 考虑到数据不完整和稀疏的问题,提议者将探索使用贝叶斯技术,该技术可以使用当前最佳的可用数据对一组专家开发的基于机器学习的先验进行操作。 一些方法上的挑战包括上下文信息和人类专业知识的结合、单独分类器输出的最佳组合,以及该项目中开发的新方法。 所有算法开发都将牢记稳健性和可扩展性,并在真实的科学用例上进行测试。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Stanislav Djorgovski其他文献

Stanislav Djorgovski的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Stanislav Djorgovski', 18)}}的其他基金

EarthCube IA: Collaborative Proposal: EarthCube Integration & Test Environment
EarthCube IA:协作提案:EarthCube 集成
  • 批准号:
    1541049
  • 财政年份:
    2015
  • 资助金额:
    $ 50万
  • 项目类别:
    Standard Grant
Collaborative Research: New Insights from a Systematic Approach to Quasar Variability
合作研究:类星体变异性系统方法的新见解
  • 批准号:
    1518308
  • 财政年份:
    2015
  • 资助金额:
    $ 50万
  • 项目类别:
    Standard Grant
EarthCube Conceptual Design: A Scalable Community Driven Architecture
EarthCube 概念设计:可扩展的社区驱动架构
  • 批准号:
    1343661
  • 财政年份:
    2014
  • 资助金额:
    $ 50万
  • 项目类别:
    Standard Grant
Open Exploration of the Time Domain with the Catalina Real-Time Transient Survey
通过 Catalina 实时瞬态勘测对时域进行开放式探索
  • 批准号:
    1413600
  • 财政年份:
    2014
  • 资助金额:
    $ 50万
  • 项目类别:
    Continuing Grant
Open Exploration of the Time Domain with the Catalina Real-Time Transient Survey
通过 Catalina 实时瞬态勘测对时域进行开放式探索
  • 批准号:
    1313422
  • 财政年份:
    2013
  • 资助金额:
    $ 50万
  • 项目类别:
    Standard Grant
The Catalina Real-Time Transient Survey (CRTS)
卡塔利娜实时瞬态调查 (CRTS)
  • 批准号:
    0909182
  • 财政年份:
    2009
  • 资助金额:
    $ 50万
  • 项目类别:
    Standard Grant
HCC:Small:Collaborative Research: Exploring the Use of Immersive Virtual Reality Technologies for Scientific Research, Communication, and Outreach
HCC:Small:协作研究:探索沉浸式虚拟现实技术在科学研究、交流和推广中的应用
  • 批准号:
    0917814
  • 财政年份:
    2009
  • 资助金额:
    $ 50万
  • 项目类别:
    Standard Grant
Collaborative Research: The Palomar-QUEST Survey
合作研究:Palomar-QUEST 调查
  • 批准号:
    0407448
  • 财政年份:
    2004
  • 资助金额:
    $ 50万
  • 项目类别:
    Continuing Grant
Support for the conference 'Virtual Observatories of the Future', June 13-16, 2000, Caltech, Pasadena, CA
支持“未来虚拟天文台”会议,2000 年 6 月 13 日至 16 日,加利福尼亚州帕萨迪纳加州理工学院
  • 批准号:
    0084709
  • 财政年份:
    2000
  • 资助金额:
    $ 50万
  • 项目类别:
    Standard Grant
Presidential Young Investigator (PYI) Award
总统青年研究员(PYI)奖
  • 批准号:
    9157412
  • 财政年份:
    1991
  • 资助金额:
    $ 50万
  • 项目类别:
    Continuing Grant

相似国自然基金

昼夜节律性small RNA在血斑形成时间推断中的法医学应用研究
  • 批准号:
  • 批准年份:
    2024
  • 资助金额:
    0.0 万元
  • 项目类别:
    省市级项目
tRNA-derived small RNA上调YBX1/CCL5通路参与硼替佐米诱导慢性疼痛的机制研究
  • 批准号:
  • 批准年份:
    2022
  • 资助金额:
    10.0 万元
  • 项目类别:
    省市级项目
Small RNA调控I-F型CRISPR-Cas适应性免疫性的应答及分子机制
  • 批准号:
    32000033
  • 批准年份:
    2020
  • 资助金额:
    24.0 万元
  • 项目类别:
    青年科学基金项目
Small RNAs调控解淀粉芽胞杆菌FZB42生防功能的机制研究
  • 批准号:
    31972324
  • 批准年份:
    2019
  • 资助金额:
    58.0 万元
  • 项目类别:
    面上项目
变异链球菌small RNAs连接LuxS密度感应与生物膜形成的机制研究
  • 批准号:
    81900988
  • 批准年份:
    2019
  • 资助金额:
    21.0 万元
  • 项目类别:
    青年科学基金项目
肠道细菌关键small RNAs在克罗恩病发生发展中的功能和作用机制
  • 批准号:
    31870821
  • 批准年份:
    2018
  • 资助金额:
    56.0 万元
  • 项目类别:
    面上项目
基于small RNA 测序技术解析鸽分泌鸽乳的分子机制
  • 批准号:
    31802058
  • 批准年份:
    2018
  • 资助金额:
    26.0 万元
  • 项目类别:
    青年科学基金项目
Small RNA介导的DNA甲基化调控的水稻草矮病毒致病机制
  • 批准号:
    31772128
  • 批准年份:
    2017
  • 资助金额:
    60.0 万元
  • 项目类别:
    面上项目
基于small RNA-seq的针灸治疗桥本甲状腺炎的免疫调控机制研究
  • 批准号:
    81704176
  • 批准年份:
    2017
  • 资助金额:
    20.0 万元
  • 项目类别:
    青年科学基金项目
水稻OsSGS3与OsHEN1调控small RNAs合成及其对抗病性的调节
  • 批准号:
    91640114
  • 批准年份:
    2016
  • 资助金额:
    85.0 万元
  • 项目类别:
    重大研究计划

相似海外基金

Automated per-plot leaf-level imaging and analysis for small plot arable field trials
针对小地块耕地试验的自动每地块叶级成像和分析
  • 批准号:
    10060164
  • 财政年份:
    2023
  • 资助金额:
    $ 50万
  • 项目类别:
    Collaborative R&D
SHF: Small: Modular Automated Verification of Concurrent Data Structures
SHF:小型:并发数据结构的模块化自动验证
  • 批准号:
    2304758
  • 财政年份:
    2023
  • 资助金额:
    $ 50万
  • 项目类别:
    Standard Grant
SaTC: CORE: Small: An Automated Framework for Mitigating Single-Trace Side-Channel Leakage
SaTC:核心:小型:用于减轻单迹侧通道泄漏的自动化框架
  • 批准号:
    2241879
  • 财政年份:
    2023
  • 资助金额:
    $ 50万
  • 项目类别:
    Standard Grant
Small Scale Robotics for Automated Dental Biofilm Theranostics
用于自动化牙科生物膜治疗的小型机器人
  • 批准号:
    10658028
  • 财政年份:
    2023
  • 资助金额:
    $ 50万
  • 项目类别:
SHF: Small: Automated Verification and Synthesis of Input Generators in Property-Based Testing Frameworks
SHF:小型:基于属性的测试框架中输入生成器的自动验证和合成
  • 批准号:
    2321680
  • 财政年份:
    2023
  • 资助金额:
    $ 50万
  • 项目类别:
    Standard Grant
SBIR Phase II: Automated Perception for Robotic Chopsticks Manipulating Small and Large Objects in Constrained Spaces
SBIR 第二阶段:机器人筷子在受限空间中操纵小型和大型物体的自动感知
  • 批准号:
    2321919
  • 财政年份:
    2023
  • 资助金额:
    $ 50万
  • 项目类别:
    Cooperative Agreement
SHF: Small: Automated Unit Test Generation using Large Language Models
SHF:小型:使用大型语言模型自动生成单元测试
  • 批准号:
    2307742
  • 财政年份:
    2023
  • 资助金额:
    $ 50万
  • 项目类别:
    Standard Grant
CNS Core: Small: Automated testing for data- and compute-intensive distributed systems through feedback-based fuzzing
CNS 核心:小型:通过基于反馈的模糊测试对数据和计算密集型分布式系统进行自动测试
  • 批准号:
    2140305
  • 财政年份:
    2022
  • 资助金额:
    $ 50万
  • 项目类别:
    Standard Grant
SHF: Small: Toward Fully Automated Formal Software Verification
SHF:小型:迈向全自动形式软件验证
  • 批准号:
    2210243
  • 财政年份:
    2022
  • 资助金额:
    $ 50万
  • 项目类别:
    Standard Grant
Fully Automated End to End Analysis of Non-small-cell Lung Carcinoma using Deep Learning Techniques
使用深度学习技术对非小细胞肺癌进行全自动端到端分析
  • 批准号:
    570281-2022
  • 财政年份:
    2022
  • 资助金额:
    $ 50万
  • 项目类别:
    Postgraduate Scholarships - Doctoral
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了