Functional data analysis methods for genomics and financial data
基因组学和金融数据的功能数据分析方法
基本信息
- 批准号:RGPIN-2020-05657
- 负责人:
- 金额:$ 1.31万
- 依托单位:
- 依托单位国家:加拿大
- 项目类别:Discovery Grants Program - Individual
- 财政年份:2021
- 资助国家:加拿大
- 起止时间:2021-01-01 至 2022-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Many fields have recently seen a rapid increase in data volume and complexity. High-dimensional data have become pervasive in the sciences, in engineering, and in the modern industrial world. Effective data analysis and interpretation still represent the bottleneck in advancing knowledge in many areas of research, in both academia and industry. Hence, there is an urgency for novel statistically sound techniques, specifically tailored for the analysis of high-dimensional data. My research will address this need by developing statistical tools for functional data, i.e. data that vary over a continuum and can be represented as curves. Functional data analysis is often employed as a fully nonparametric approach for the modeling of data varying over time, such as time series and longitudinal data. A less popular but very promising application area is represented by the so-called "Omics" sciences (genomics, epigenomics.) - in which the modern high-throughput sequencing technologies produce high-dimensional data that can be represented as curves over the genome. The long-term objective of my research program is to develop novel statistical methods to analyze functional data and to provide computationally efficient implementations of such methods. In particular, my research over the next few years will focus on the problem of discovering functional motifs, i.e. typical "shapes" that may recur several times along and across a set of curves, capturing important local characteristics. I recently developed probabilistic K-mean with local alignment (probKMA), a clustering method able to identify K candidate functional motifs in a set of curves. My research program intends to build upon this recent development and will pursue several methodological and applied directions. My first focus will be to develop a new motif discovery technique based on biclustering. I will also extend these methods to detect motifs in a single curve as well as motifs whose instances have similar shapes but different lengths, and I will develop a rigorous assessment of the statistical significance of motifs found, that is critical to distinguish between real motifs and motifs that are randomly present in the background of curves. Afterward, I will apply the developed methods to the real-world problems that motivate them - in particular to the analysis of "Omics" data and time series of asset prices. The proposed research will result in user-friendly and fast software that will be freely available to the general public and will enable the extraction of relevant information from curves. The multidisciplinary nature of this research will translate into a broad training opportunity for students involved in my research. The training provided by my program, at the boundaries of several STEM disciplines, will contribute to the education of data scientists, who are becoming vital for both companies and universities.
最近,许多领域的数据量和复杂性都在迅速增加。高维数据在科学、工程和现代工业世界中已经变得无处不在。在学术界和工业界的许多研究领域,有效的数据分析和解释仍然是推进知识的瓶颈。因此,迫切需要新的统计可靠的技术,专门为高维数据的分析量身定制。我的研究将通过开发功能数据的统计工具来解决这一需求,即在连续统上变化的数据,可以用曲线表示。功能数据分析通常被用作对随时间变化的数据(如时间序列和纵向数据)建模的完全非参数方法。一个不太受欢迎但非常有前途的应用领域是所谓的“组学”科学(基因组学、表观基因组学)。现代高通量测序技术产生高维数据,可以用基因组曲线表示。我的研究计划的长期目标是开发新的统计方法来分析功能数据,并提供这些方法的计算效率实现。特别是,在接下来的几年里,我的研究将集中在发现功能母题的问题上,即典型的“形状”,可能沿着一组曲线反复出现几次,捕捉重要的局部特征。我最近开发了基于局部对齐的概率K-均值(probKMA),这是一种能够在一组曲线中识别K个候选功能基元的聚类方法。我的研究计划打算以这一最新发展为基础,并将追求几个方法和应用方向。我的第一个重点将是开发一种新的基于双聚类的motif发现技术。我还将扩展这些方法来检测单个曲线中的母题以及具有相似形状但长度不同的母题实例,并且我将对所发现的母题的统计显著性进行严格评估,这对于区分真实母题和随机出现在曲线背景中的母题至关重要。之后,我将把这些发展起来的方法应用到现实世界的问题中,尤其是对“组学”数据和资产价格的时间序列的分析。拟议的研究将产生方便用户和快速的软件,这些软件将免费提供给一般公众,并将能够从曲线中提取有关信息。这项研究的多学科性质将为参与我研究的学生提供广泛的培训机会。我的项目在几个STEM学科的边界上提供的培训,将有助于数据科学家的教育,他们对公司和大学都变得至关重要。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Cremona, Marzia其他文献
Cremona, Marzia的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Cremona, Marzia', 18)}}的其他基金
Functional data analysis methods for genomics and financial data
基因组学和金融数据的功能数据分析方法
- 批准号:
RGPIN-2020-05657 - 财政年份:2022
- 资助金额:
$ 1.31万 - 项目类别:
Discovery Grants Program - Individual
Functional data analysis methods for genomics and financial data
基因组学和金融数据的功能数据分析方法
- 批准号:
RGPIN-2020-05657 - 财政年份:2020
- 资助金额:
$ 1.31万 - 项目类别:
Discovery Grants Program - Individual
Functional data analysis methods for genomics and financial data
基因组学和金融数据的功能数据分析方法
- 批准号:
DGECR-2020-00353 - 财政年份:2020
- 资助金额:
$ 1.31万 - 项目类别:
Discovery Launch Supplement
相似国自然基金
Scalable Learning and Optimization: High-dimensional Models and Online Decision-Making Strategies for Big Data Analysis
- 批准号:
- 批准年份:2024
- 资助金额:万元
- 项目类别:合作创新研究团队
Data-driven Recommendation System Construction of an Online Medical Platform Based on the Fusion of Information
- 批准号:
- 批准年份:2024
- 资助金额:万元
- 项目类别:外国青年学者研究基金项目
Development of a Linear Stochastic Model for Wind Field Reconstruction from Limited Measurement Data
- 批准号:
- 批准年份:2020
- 资助金额:40 万元
- 项目类别:
基于高频信息下高维波动率矩阵估计及应用
- 批准号:71901118
- 批准年份:2019
- 资助金额:18.0 万元
- 项目类别:青年科学基金项目
半参数空间自回归面板模型的有效估计与应用研究
- 批准号:71961011
- 批准年份:2019
- 资助金额:16.0 万元
- 项目类别:地区科学基金项目
高频数据波动率统计推断、预测与应用
- 批准号:71971118
- 批准年份:2019
- 资助金额:50.0 万元
- 项目类别:面上项目
基于个体分析的投影式非线性非负张量分解在高维非结构化数据模式分析中的研究
- 批准号:61502059
- 批准年份:2015
- 资助金额:19.0 万元
- 项目类别:青年科学基金项目
基于Linked Open Data的Web服务语义互操作关键技术
- 批准号:61373035
- 批准年份:2013
- 资助金额:77.0 万元
- 项目类别:面上项目
体数据表达与绘制的新方法研究
- 批准号:61170206
- 批准年份:2011
- 资助金额:55.0 万元
- 项目类别:面上项目
一类新Regime-Switching模型及其在金融建模中的应用研究
- 批准号:11061041
- 批准年份:2010
- 资助金额:24.0 万元
- 项目类别:地区科学基金项目
相似海外基金
Functional consequences of intergenic autoimmune disease risk variants
基因间自身免疫性疾病风险变异的功能后果
- 批准号:
10655161 - 财政年份:2023
- 资助金额:
$ 1.31万 - 项目类别:
Functional exploration of a deep Mycobacterium tuberculosis phosphoproteome
结核分枝杆菌深层磷酸蛋白质组的功能探索
- 批准号:
10656957 - 财政年份:2023
- 资助金额:
$ 1.31万 - 项目类别:
Investigating the functional impact of genetic variants in the human proteome
研究人类蛋白质组中遗传变异的功能影响
- 批准号:
10715585 - 财政年份:2023
- 资助金额:
$ 1.31万 - 项目类别:
Using proteogenomics to assess the functional impact of alternative splicing events in glioblastoma
使用蛋白质基因组学评估选择性剪接事件对胶质母细胞瘤的功能影响
- 批准号:
10577186 - 财政年份:2023
- 资助金额:
$ 1.31万 - 项目类别:
Functional Analysis of Distinct and Co-existing Transcriptional Programs Regulating Tumor Dormancy
调节肿瘤休眠的不同和共存转录程序的功能分析
- 批准号:
10584353 - 财政年份:2023
- 资助金额:
$ 1.31万 - 项目类别:
Fecal Microbiota Transfer Attenuates Aged Gut Dysbiosis and Functional Deficits after Traumatic Brain Injury
粪便微生物群转移可减轻老年肠道菌群失调和脑外伤后的功能缺陷
- 批准号:
10573109 - 财政年份:2023
- 资助金额:
$ 1.31万 - 项目类别:
Functional Landscape of Glycosylation in Skin Cancer
皮肤癌中糖基化的功能景观
- 批准号:
10581094 - 财政年份:2023
- 资助金额:
$ 1.31万 - 项目类别:
Dissecting the functional organization of local hippocampal circuits underlying spatial representations
剖析空间表征下局部海马回路的功能组织
- 批准号:
10590363 - 财政年份:2023
- 资助金额:
$ 1.31万 - 项目类别:
Genomic and functional investigations of the transcriptional regulatory network of tooth enamel development
牙釉质发育转录调控网络的基因组和功能研究
- 批准号:
10720303 - 财政年份:2023
- 资助金额:
$ 1.31万 - 项目类别:
Unlocking whole brain, layer-specific functional connectivity with 3D VAPER fMRI
通过 3D VAPER fMRI 解锁全脑、特定层的功能连接
- 批准号:
10643636 - 财政年份:2023
- 资助金额:
$ 1.31万 - 项目类别: