III: Small: Automated Event Classification and Decision Making in Massive Data Streams
III:小:海量数据流中的自动事件分类和决策
基本信息
- 批准号:1118041
- 负责人:
- 金额:$ 50万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2011
- 资助国家:美国
- 起止时间:2011-08-01 至 2014-07-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
As the exponential growth of data volumes and complexity continues in all sciences (and indeed all other fields of the modern society, economy, commerce, security, etc.), there is a growing need for powerful new tools and methodologies which can help us extract knowledge and understanding from these massive data sets and data streams. The newly gained knowledge is often used to guide our actions, and in science that typically means follow-up studies and measurements, as the research cycle continues. As the data rates and volume increase, it becomes necessary to take humans out of the loop, and develop automated methods for time-critical knowledge extraction and optimized response to anomalous or interesting events found by the data processing pipelines. This proposal is to develop a system that will be an example of a new generation of scientific experiments and methods that involve real-time mining of massive data streams, and dynamical follow-up strategies. The system would be developed and validated in the context of real scientific situations from the emerging field of time-domain astronomy. A new generation of synoptic sky surveys covers the sky repeatedly, detecting variable or transient phenomena, over a broad range of astrophysics, from the Solar system and stellar evolution, to cosmology and extreme relativistic objects; from extrasolar planets to gamma-ray bursts and supernovae as probes of the dark energy. As we explore the observable parameter space, there is a real possibility of discovery of new types of objects and phenomena. The system will enable exciting new astrophysics, and facilitate discovery. The key to this is a fully automated classification and prioritization of the transient events, and their follow-up observations. This poses some interesting challenges for applied computer science, especially in the area of Machine Learning, including an automated classification where only a sparse, incomplete, and heterogeneous data are available, and contextual information and domain expertise must be folded in the process. The process must be dynamic, incorporating new data as they become available, and revising the classifications accordingly. The system would then generate automatically decisions for an optimal follow-up of the most interesting events, given the available limited assets and resources. This project will aid the entire astronomical community in developing new scientific strategies and procedures in the era of large synoptic sky surveys, facilitate data sharing and re-use, and stimulate further development of Virtual Observatory capabilities. The methods and experiences gained here will be described in the open literature, so that they may find a broader use outside astronomy, wherever similar time-critical situations occur, thus fostering constructive new synergies between applied computer science and other domains. The proposers will train undergraduate and graduate students and postdocs, in the methods of scientific computing and computational thinking, and develop effective EPO materials, touching on both the new science and computation.The challenges posed by the knowledge extraction in the era of data abundance become even sharper in the time-critical situations where we mine the information from massive data streams, especially when the phenomena under study are short-lived, and/or a rapid follow-up reaction is needed. Potentially interesting phenomena and events must be identified, classified, and prioritized in real time, typically using some combination of the new measurements, and existing archival data and models. Then an optimal decision has to be made as to what is the best follow-up that will provide the essential new information in any given individual case; this can be critical if the follow-up assets are scarce or costly. If the time scales are short, and data rates large, the implication is that humans should be taken out of the loop, and that the classification, prioritization, and follow-up decision process must be fully automated. Machine learning (ML) and machine intelligence tools become a necessity. This proposal is to develop a novel, ML-based system for a real-time classification and prioritization of transient events, using the newly emerging field of time-domain astronomy and synoptic sky surveys as a scientific testbed. The classification problem here is different from the usual situations: the data are sparse and/or incomplete, heterogeneous, and evolving as the new measurements come in; the decision process has to take into account the uncertainties of the classification process, and the available assets; and so on. While the sky surveys detect transient cosmic events, the scientific returns come from their directed follow-up. It is essential to be able to classify and prioritize interesting events, especially as we move from the present Terascale data streams and tens of candidate events per night, to the future Petascale data regime, with literally millions of candidates, only a handful of which can be followed. Given the problem of data incompleteness and sparsity, the proposers will explore the use of Bayesian techniques that can operate on a set of expert-developed and ML-based priors, using the currently best available data. Some of the methodological challenges include incorporation of the contextual information and human expertise and optimal combination of separate classifier outputs, as well as new methods developed in this project. All of the algorithmic developments will be done keeping the robustness and scalability in mind, and tested on real scientific use cases.
随着数据量和复杂性的指数增长在所有科学中(以及实际上现代社会的所有其他领域,经济,商业,安全等)继续,我们越来越需要强大的新工具和方法,帮助我们从这些海量数据集和数据流中提取知识和理解。 新获得的知识通常用于指导我们的行动,在科学中,随着研究周期的继续,这通常意味着后续研究和测量。 随着数据速率和数据量的增加,有必要将人类排除在循环之外,并开发自动化方法,用于时间关键的知识提取和对数据处理管道发现的异常或有趣事件的优化响应。 该提案旨在开发一个系统,该系统将成为新一代科学实验和方法的一个例子,这些实验和方法涉及对大量数据流的实时挖掘和动态跟踪策略。 该系统将在时域天文学新兴领域的真实的科学情况下开发和验证。 新一代的天气巡天观测重复地覆盖天空,探测从太阳系和恒星演化到宇宙学和极端相对论物体的广泛天体物理学范围内的可变或瞬态现象;从太阳系外行星到伽马射线爆发和超新星,作为暗能量的探测器。 当我们探索可观测的参数空间时,就有了发现新类型的物体和现象的真实的可能性。 该系统将实现令人兴奋的新天体物理学,并促进发现。 其关键是对瞬态事件及其后续观察进行全自动分类和优先级排序。 这对应用计算机科学提出了一些有趣的挑战,特别是在机器学习领域,包括自动分类,其中只有稀疏,不完整和异构的数据可用,并且在此过程中必须折叠上下文信息和领域专业知识。 这一过程必须是动态的,在获得新数据时纳入这些数据,并相应地修订分类。 然后,该系统将根据现有的有限资产和资源,自动作出决定,对最令人感兴趣的活动采取最佳后续行动。 该项目将帮助整个天文学界在大型天气巡天时代制定新的科学战略和程序,促进数据共享和再利用,并促进虚拟天文台能力的进一步发展。 这里获得的方法和经验将在公开文献中描述,以便它们可以在天文学之外找到更广泛的用途,无论发生类似的时间紧迫的情况,从而促进应用计算机科学和其他领域之间的建设性新的协同作用。 建议者将对本科生、研究生和博士后进行科学计算和计算思维方法的培训,并开发有效的EPO材料,涉及新科学和计算。在数据丰富的时代,知识提取所带来的挑战在时间紧迫的情况下变得更加尖锐,我们从海量数据流中挖掘信息,特别是当所研究的现象是短暂的和/或需要快速的后续反应时。 潜在的有趣的现象和事件必须被识别,分类,并在真实的时间优先级,通常使用新的测量,现有的档案数据和模型的一些组合。 然后,必须作出最佳决定,即在任何特定个案中,什么是能提供重要新信息的最佳后续行动;如果后续行动资源稀缺或昂贵,这一点可能至关重要。 如果时间尺度很短,数据率很大,这意味着人类应该被排除在循环之外,分类,优先级和后续决策过程必须完全自动化。 机器学习(ML)和机器智能工具成为必需品。 这项建议是开发一种新的,ML为基础的系统的实时分类和优先级的瞬变事件,使用新出现的领域的时域天文学和天气天空调查作为一个科学的试验台。这里的分类问题与通常的情况不同:数据稀疏和/或不完整,异质性,随着新的测量结果的出现而不断变化;决策过程必须考虑分类过程的不确定性和可用资产;等等,虽然天空调查探测到短暂的宇宙事件,但科学回报来自它们的直接后续行动。 能够对有趣的事件进行分类和优先级排序是至关重要的,特别是当我们从目前的Terascale数据流和每晚数十个候选事件转变为未来的Petascale数据体系时,实际上有数百万个候选事件,只有少数几个可以被跟踪。 鉴于数据不完整和稀疏的问题,提议者将探索使用贝叶斯技术,该技术可以使用目前最好的可用数据,对一组专家开发的和基于ML的先验进行操作。 一些方法上的挑战包括纳入上下文信息和人类的专业知识和单独的分类器输出的最佳组合,以及在这个项目中开发的新方法。 所有的算法开发都将考虑到鲁棒性和可扩展性,并在真实的科学用例上进行测试。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Stanislav Djorgovski其他文献
Stanislav Djorgovski的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Stanislav Djorgovski', 18)}}的其他基金
EarthCube IA: Collaborative Proposal: EarthCube Integration & Test Environment
EarthCube IA:协作提案:EarthCube 集成
- 批准号:
1541049 - 财政年份:2015
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
Collaborative Research: New Insights from a Systematic Approach to Quasar Variability
合作研究:类星体变异性系统方法的新见解
- 批准号:
1518308 - 财政年份:2015
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
EarthCube Conceptual Design: A Scalable Community Driven Architecture
EarthCube 概念设计:可扩展的社区驱动架构
- 批准号:
1343661 - 财政年份:2014
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
Open Exploration of the Time Domain with the Catalina Real-Time Transient Survey
通过 Catalina 实时瞬态勘测对时域进行开放式探索
- 批准号:
1413600 - 财政年份:2014
- 资助金额:
$ 50万 - 项目类别:
Continuing Grant
Open Exploration of the Time Domain with the Catalina Real-Time Transient Survey
通过 Catalina 实时瞬态勘测对时域进行开放式探索
- 批准号:
1313422 - 财政年份:2013
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
The Catalina Real-Time Transient Survey (CRTS)
卡塔利娜实时瞬态调查 (CRTS)
- 批准号:
0909182 - 财政年份:2009
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
HCC:Small:Collaborative Research: Exploring the Use of Immersive Virtual Reality Technologies for Scientific Research, Communication, and Outreach
HCC:Small:协作研究:探索沉浸式虚拟现实技术在科学研究、交流和推广中的应用
- 批准号:
0917814 - 财政年份:2009
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
Collaborative Research: The Palomar-QUEST Survey
合作研究:Palomar-QUEST 调查
- 批准号:
0407448 - 财政年份:2004
- 资助金额:
$ 50万 - 项目类别:
Continuing Grant
Support for the conference 'Virtual Observatories of the Future', June 13-16, 2000, Caltech, Pasadena, CA
支持“未来虚拟天文台”会议,2000 年 6 月 13 日至 16 日,加利福尼亚州帕萨迪纳加州理工学院
- 批准号:
0084709 - 财政年份:2000
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
Presidential Young Investigator (PYI) Award
总统青年研究员(PYI)奖
- 批准号:
9157412 - 财政年份:1991
- 资助金额:
$ 50万 - 项目类别:
Continuing Grant
相似国自然基金
昼夜节律性small RNA在血斑形成时间推断中的法医学应用研究
- 批准号:
- 批准年份:2024
- 资助金额:0.0 万元
- 项目类别:省市级项目
tRNA-derived small RNA上调YBX1/CCL5通路参与硼替佐米诱导慢性疼痛的机制研究
- 批准号:
- 批准年份:2022
- 资助金额:10.0 万元
- 项目类别:省市级项目
Small RNA调控I-F型CRISPR-Cas适应性免疫性的应答及分子机制
- 批准号:32000033
- 批准年份:2020
- 资助金额:24.0 万元
- 项目类别:青年科学基金项目
Small RNAs调控解淀粉芽胞杆菌FZB42生防功能的机制研究
- 批准号:31972324
- 批准年份:2019
- 资助金额:58.0 万元
- 项目类别:面上项目
变异链球菌small RNAs连接LuxS密度感应与生物膜形成的机制研究
- 批准号:81900988
- 批准年份:2019
- 资助金额:21.0 万元
- 项目类别:青年科学基金项目
肠道细菌关键small RNAs在克罗恩病发生发展中的功能和作用机制
- 批准号:31870821
- 批准年份:2018
- 资助金额:56.0 万元
- 项目类别:面上项目
基于small RNA 测序技术解析鸽分泌鸽乳的分子机制
- 批准号:31802058
- 批准年份:2018
- 资助金额:26.0 万元
- 项目类别:青年科学基金项目
Small RNA介导的DNA甲基化调控的水稻草矮病毒致病机制
- 批准号:31772128
- 批准年份:2017
- 资助金额:60.0 万元
- 项目类别:面上项目
基于small RNA-seq的针灸治疗桥本甲状腺炎的免疫调控机制研究
- 批准号:81704176
- 批准年份:2017
- 资助金额:20.0 万元
- 项目类别:青年科学基金项目
水稻OsSGS3与OsHEN1调控small RNAs合成及其对抗病性的调节
- 批准号:91640114
- 批准年份:2016
- 资助金额:85.0 万元
- 项目类别:重大研究计划
相似海外基金
Automated per-plot leaf-level imaging and analysis for small plot arable field trials
针对小地块耕地试验的自动每地块叶级成像和分析
- 批准号:
10060164 - 财政年份:2023
- 资助金额:
$ 50万 - 项目类别:
Collaborative R&D
SHF: Small: Modular Automated Verification of Concurrent Data Structures
SHF:小型:并发数据结构的模块化自动验证
- 批准号:
2304758 - 财政年份:2023
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
SaTC: CORE: Small: An Automated Framework for Mitigating Single-Trace Side-Channel Leakage
SaTC:核心:小型:用于减轻单迹侧通道泄漏的自动化框架
- 批准号:
2241879 - 财政年份:2023
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
Small Scale Robotics for Automated Dental Biofilm Theranostics
用于自动化牙科生物膜治疗的小型机器人
- 批准号:
10658028 - 财政年份:2023
- 资助金额:
$ 50万 - 项目类别:
SHF: Small: Automated Verification and Synthesis of Input Generators in Property-Based Testing Frameworks
SHF:小型:基于属性的测试框架中输入生成器的自动验证和合成
- 批准号:
2321680 - 财政年份:2023
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
SBIR Phase II: Automated Perception for Robotic Chopsticks Manipulating Small and Large Objects in Constrained Spaces
SBIR 第二阶段:机器人筷子在受限空间中操纵小型和大型物体的自动感知
- 批准号:
2321919 - 财政年份:2023
- 资助金额:
$ 50万 - 项目类别:
Cooperative Agreement
SHF: Small: Automated Unit Test Generation using Large Language Models
SHF:小型:使用大型语言模型自动生成单元测试
- 批准号:
2307742 - 财政年份:2023
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
CNS Core: Small: Automated testing for data- and compute-intensive distributed systems through feedback-based fuzzing
CNS 核心:小型:通过基于反馈的模糊测试对数据和计算密集型分布式系统进行自动测试
- 批准号:
2140305 - 财政年份:2022
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
SHF: Small: Toward Fully Automated Formal Software Verification
SHF:小型:迈向全自动形式软件验证
- 批准号:
2210243 - 财政年份:2022
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
Fully Automated End to End Analysis of Non-small-cell Lung Carcinoma using Deep Learning Techniques
使用深度学习技术对非小细胞肺癌进行全自动端到端分析
- 批准号:
570281-2022 - 财政年份:2022
- 资助金额:
$ 50万 - 项目类别:
Postgraduate Scholarships - Doctoral














{{item.name}}会员




