Managing and Modeling Time in Genomics Data
基因组数据中的时间管理和建模
基本信息
- 批准号:RGPIN-2014-05362
- 负责人:
- 金额:$ 4.52万
- 依托单位:
- 依托单位国家:加拿大
- 项目类别:Discovery Grants Program - Individual
- 财政年份:2018
- 资助国家:加拿大
- 起止时间:2018-01-01 至 2019-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Understanding the mechanisms associated with observed biological phenotypes is a common goal of many genomic studies. However, it is often the case that measuring molecular entities at a single time-point is insufficient to capture the complexity of many biological systems; truly systematic measurement needs to consider dynamic changes across time and space. With decreasing cost and sample requirements, longitudinal genomic data are accumulating quickly. Yet tools that help researchers to model and understand temporal changes in genomics studies are largely missing. The long-term objective of the proposed research program is to develop tools and algorithms to handle temporal changes covering a wide range of time scales or frequencies. Specifically, we categorize the tools and algorithms to be developed into three "layers": (a) a Data layer; (b) a Modeling layer; and (c) an Interpretation layer. The Data layer focuses on data quality and outlier management; the Modeling layer focuses on modeling time and analyzing genomics data in scalable ways; and the Interpretation layer helps researchers interpret their data and the outcomes of their analyses by linking users to knowledge embedded in online publications and discussions.**Regarding the Data layer, "Garbage in, garbage out." If the data at hand are not of high quality, it is unlikely that subsequent modeling and interpretation will be fruitful. We propose to develop methods for assessing the quality of, and removing random or systematic noise from, data generated from newer genomics technologies and platforms, including next-generation sequencing. A key idea here is that the space-time relationships between the data points can be exploited to assess and enhance data quality. It is also important to identify samples that are outliers. The time series nature of the data can be exploited to derive quality models, from which deviant samples can be flagged for further examination. We also plan to identify possible reasons that explain why certain samples are identified as outlying. **Regarding the Modeling layer, time series often need to be augmented with latent variables for modeling and interpretation reasons. Moreover, the observations are typically obtained from a mixture of distributions, which are often unknown a priori. We propose to develop non-parametric Bayesian models to learn the latent structure of complex time series. One key challenge is the large number of features for modeling multi-omic data. Thus, we will focus on scalable or parallelizable schemes. Genomics data may be based on heterogeneous sub-populations of cells. Thus, for more effective modeling, it is important to develop methods to deconvolve the underlying composition to better capture the time-varying dynamics. **Effective knowledge discovery from genomics data requires significant involvement of domain experts in interpreting the data and models. The Interpretation layer focuses on tools for linking users to knowledge embedded in online publications. For example, relationships involving genes/peptides identified using whole genome technologies are valuable to researchers in interpreting their results. Basic Google-style searches are not sufficient to replace more sophisticated natural language processing to extract relations from text, including temporal expressions, and temporal or sequential relationships among entities (e.g., genes). As social networking has gained widespread use in the past decade, research blogging has also been growing in the genomics and medical research communities. We propose to develop methods for summarizing online discussions among researchers. One novel idea is to explore how to produce abstractive summaries in response to user-given queries.
了解与观察到的生物表型相关的机制是许多基因组研究的共同目标。然而,通常情况下,在单个时间点测量分子实体不足以捕获许多生物系统的复杂性;真正系统的测量需要考虑跨时间和空间的动态变化。随着成本和样本需求的降低,纵向基因组数据正在快速积累。然而,帮助研究人员在基因组学研究中建模和理解时间变化的工具在很大程度上缺失。拟议的研究计划的长期目标是开发工具和算法,以处理涵盖广泛的时间尺度或频率的时间变化。具体来说,我们将开发的工具和算法分为三个“层”:(a)数据层;(B)建模层;(c)解释层。数据层专注于数据质量和离群值管理;建模层专注于建模时间和以可扩展的方式分析基因组学数据;解释层通过将用户与在线出版物和讨论中嵌入的知识联系起来,帮助研究人员解释他们的数据和分析结果。关于数据层,“垃圾进,垃圾出。“如果手头的数据质量不高,后续的建模和解释就不太可能取得成果。我们建议开发方法来评估新基因组学技术和平台(包括下一代测序)产生的数据的质量,并从中去除随机或系统性噪声。这里的一个关键思想是,可以利用数据点之间的时空关系来评估和提高数据质量。 识别异常值样本也很重要。可以利用数据的时间序列性质来导出质量模型,从中可以标记异常样本以进行进一步检查。我们还计划找出解释某些样本被识别为异常的可能原因。** 关于建模层,出于建模和解释的原因,时间序列通常需要增加潜变量。此外,观察通常是从混合分布中获得的,这些分布通常是先验未知的。我们建议开发非参数贝叶斯模型来学习复杂时间序列的潜在结构。一个关键的挑战是用于建模多组学数据的大量特征。因此,我们将专注于可扩展或可并行化的方案。基因组学数据可以基于细胞的异质亚群。因此,为了更有效地建模,重要的是开发方法来对底层成分进行去卷积,以更好地捕获时变动态。** 从基因组学数据中有效地发现知识需要领域专家大量参与解释数据和模型。解释层侧重于将用户与在线出版物中嵌入的知识联系起来的工具。例如,涉及使用全基因组技术鉴定的基因/肽的关系对于研究人员解释其结果是有价值的。基本的谷歌式搜索不足以取代更复杂的自然语言处理来从文本中提取关系,包括时间表达以及实体之间的时间或顺序关系(例如,基因)。随着社交网络在过去十年中的广泛使用,研究博客在基因组学和医学研究界也越来越多。我们建议开发的方法,总结研究人员之间的在线讨论。一个新颖的想法是探索如何产生抽象的摘要,以响应用户给定的查询。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Ng, Raymond其他文献
The English and Chinese versions of the five-level EuroQoL Group's five-dimension questionnaire (EQ-5D) were valid and reliable and provided comparable scores in Asian breast cancer patients
- DOI:
10.1007/s00520-012-1512-x - 发表时间:
2013-01-01 - 期刊:
- 影响因子:3.1
- 作者:
Lee, Chun Fan;Ng, Raymond;Cheung, Yin Bun - 通讯作者:
Cheung, Yin Bun
Caspase-6-cleaved tau is relevant in Alzheimer's disease and marginal in four-repeat tauopathies: Diagnostic and therapeutic implications.
- DOI:
10.1111/nan.12819 - 发表时间:
2022-08 - 期刊:
- 影响因子:5
- 作者:
Theofilas, Panos;Piergies, Antonia M. H.;Oh, Ian;Lee, Yoo Bin;Li, Song Hua;Pereira, Felipe L.;Petersen, Cathrine;Ehrenberg, Alexander J.;Eser, Rana A.;Ambrose, Andrew J.;Chin, Brian;Yang, Teddy;Khan, Shireen;Ng, Raymond;Spina, Salvatore;Seeley, Willian W.;Miller, Bruce L.;Arkin, Michelle R.;Grinberg, Lea T. - 通讯作者:
Grinberg, Lea T.
MicroRNAs control hepatocyte proliferation during liver regeneration.
- DOI:
10.1002/hep.23547 - 发表时间:
2010-05 - 期刊:
- 影响因子:13.5
- 作者:
Song, Guisheng;Sharma, Amar Deep;Roll, Garrett R.;Ng, Raymond;Lee, Andrew Y.;Blelloch, Robert H.;Frandsen, Niels M.;Willenbring, Holger - 通讯作者:
Willenbring, Holger
Affordability of cancer treatment for aging cancer patients in Singapore: an analysis of health, lifestyle, and financial burden
- DOI:
10.1007/s00520-013-1930-4 - 发表时间:
2013-12-01 - 期刊:
- 影响因子:3.1
- 作者:
Chan, Alexandre;Chiang, Yu Yan;Ng, Raymond - 通讯作者:
Ng, Raymond
Synthetic data as an enabler for machine learning applications in medicine.
- DOI:
10.1016/j.isci.2022.105331 - 发表时间:
2022-11-18 - 期刊:
- 影响因子:5.8
- 作者:
Rajotte, Jean-Francois;Bergen, Robert;Buckeridge, David L.;El Emam, Khaled;Ng, Raymond;Strome, Elissa - 通讯作者:
Strome, Elissa
Ng, Raymond的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Ng, Raymond', 18)}}的其他基金
Stream Analytics for Diverse Applications
适用于多种应用的流分析
- 批准号:
RGPIN-2019-04044 - 财政年份:2022
- 资助金额:
$ 4.52万 - 项目类别:
Discovery Grants Program - Individual
Data Science and Analytics
数据科学与分析
- 批准号:
CRC-2016-00231 - 财政年份:2022
- 资助金额:
$ 4.52万 - 项目类别:
Canada Research Chairs
Data Science And Analytics
数据科学与分析
- 批准号:
CRC-2016-00231 - 财政年份:2021
- 资助金额:
$ 4.52万 - 项目类别:
Canada Research Chairs
Data Science and Composite Materials Manufacturing
数据科学与复合材料制造
- 批准号:
549167-2019 - 财政年份:2021
- 资助金额:
$ 4.52万 - 项目类别:
Alliance Grants
Stream Analytics for Diverse Applications
适用于多种应用的流分析
- 批准号:
RGPIN-2019-04044 - 财政年份:2021
- 资助金额:
$ 4.52万 - 项目类别:
Discovery Grants Program - Individual
Data Science and Analytics
数据科学与分析
- 批准号:
CRC-2016-00231 - 财政年份:2020
- 资助金额:
$ 4.52万 - 项目类别:
Canada Research Chairs
Stream Analytics for Diverse Applications
适用于多种应用的流分析
- 批准号:
RGPIN-2019-04044 - 财政年份:2020
- 资助金额:
$ 4.52万 - 项目类别:
Discovery Grants Program - Individual
Data Science and Composite Materials Manufacturing
数据科学与复合材料制造
- 批准号:
549167-2019 - 财政年份:2020
- 资助金额:
$ 4.52万 - 项目类别:
Alliance Grants
Data Science and Analytics
数据科学与分析
- 批准号:
CRC-2016-00231 - 财政年份:2019
- 资助金额:
$ 4.52万 - 项目类别:
Canada Research Chairs
Stream Analytics for Diverse Applications
适用于多种应用的流分析
- 批准号:
RGPIN-2019-04044 - 财政年份:2019
- 资助金额:
$ 4.52万 - 项目类别:
Discovery Grants Program - Individual
相似国自然基金
Galaxy Analytical Modeling
Evolution (GAME) and cosmological
hydrodynamic simulations.
- 批准号:
- 批准年份:2025
- 资助金额:10.0 万元
- 项目类别:省市级项目
相似海外基金
CRII: SCH: Towards Smart Patient Flow Management: Real-time Inpatient Length of Stay Modeling and Prediction
CRII:SCH:迈向智能患者流程管理:实时住院患者住院时间建模和预测
- 批准号:
2246158 - 财政年份:2023
- 资助金额:
$ 4.52万 - 项目类别:
Standard Grant
Predictive modeling of mammalian cell fate transitions over time and space with single-cell genomics
利用单细胞基因组学预测哺乳动物细胞命运随时间和空间转变的模型
- 批准号:
10572855 - 财政年份:2023
- 资助金额:
$ 4.52万 - 项目类别:
Real-time predictive modeling for public health departments to control infectious diseases
公共卫生部门控制传染病的实时预测模型
- 批准号:
10878316 - 财政年份:2023
- 资助金额:
$ 4.52万 - 项目类别:
ATD: Dynamic Modeling for Extreme Event Prediction with Uncertainty Quantification with Multi-panel Time Series
ATD:通过多面板时间序列不确定性量化进行极端事件预测的动态建模
- 批准号:
2319260 - 财政年份:2023
- 资助金额:
$ 4.52万 - 项目类别:
Standard Grant
Modeling Multivariate and Space-Time Processes: Foundations and Innovations
多元和时空过程建模:基础和创新
- 批准号:
2310419 - 财政年份:2023
- 资助金额:
$ 4.52万 - 项目类别:
Standard Grant
Modeling Multivariate and Space-Time Processes: Foundations and Innovations
多元和时空过程建模:基础和创新
- 批准号:
2348154 - 财政年份:2023
- 资助金额:
$ 4.52万 - 项目类别:
Standard Grant
3D modeling of Jomon cord marker by the structure from motion and identification of the pottery produced at the same time
通过运动结构对绳文绳标记进行 3D 建模并同时识别陶器
- 批准号:
23K00946 - 财政年份:2023
- 资助金额:
$ 4.52万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Gravitational Wave Modeling Using Time-Domain Black Hole Perturbation Theory
使用时域黑洞微扰理论进行引力波建模
- 批准号:
2307236 - 财政年份:2023
- 资助金额:
$ 4.52万 - 项目类别:
Continuing Grant
Advances in High-dimensional Time Series Modeling and Its Interface with Deep Learning
高维时间序列建模及其与深度学习接口的进展
- 批准号:
2311178 - 财政年份:2023
- 资助金额:
$ 4.52万 - 项目类别:
Continuing Grant
Developing a dynamic modeling framework for surveillance, prediction, and real-time resource allocation to reduce health disparities during Covid-19 and future pandemics
开发用于监测、预测和实时资源分配的动态建模框架,以减少 Covid-19 和未来大流行期间的健康差距
- 批准号:
10584876 - 财政年份:2023
- 资助金额:
$ 4.52万 - 项目类别: