Parameter Estimation for Non-Gaussian Model-Based Clustering with High-Dimensional Data
基于非高斯模型的高维数据聚类参数估计
基本信息
- 批准号:RGPIN-2017-05258
- 负责人:
- 金额:$ 1.02万
- 依托单位:
- 依托单位国家:加拿大
- 项目类别:Discovery Grants Program - Individual
- 财政年份:2020
- 资助国家:加拿大
- 起止时间:2020-01-01 至 2021-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Big data is an important issue for modern data and statistical analysis. Computers can store huge amounts of data; however, methods to accurately and quickly analyze the data have not kept pace with improvements to modern storage technology. In some cases, data are discarded without being analyzed. Improved statistical analysis of big data will benefit any field dealing with massive amounts of data, such as biological sciences (e.g., genomics), finance and informatics, astronomy, cosmology, and climate science.
My proposed research will utilize a form of computer programming called Evolutionary Computation (EC). EC uses techniques copied from the biological theory of evolution by natural selection. In biology, the goal usually is to produce as many fit offspring as possible, who go on to produce their own fit offspring. Random mutations to the genome will make some children more fit, or less fit, than their parents. The fitter children are more likely to produce healthy offspring, so their genes get passed on. For my research, the measure of "fitness" used is how well the algorithm searches for optimum solutions with regard to clustering big data. (Clustering involves accounting for the underlying structure that links data points, so that they can be put into correct groups, or labelled correctly, e.g., linking gene expression to types of cancer.) Techniques such as cross-over and mutation are copied from biology, and are used to "evolve" the algorithm and make it fitter each time it runs.
Under the proposed research, evolutionary algorithms (EAs) will be developed, as alternatives to the almost ubiquitous expectation-maximization (EM) algorithm and its variants, for Gaussian and non-Gaussian mixture model-based approaches to clustering. EAs will be developed for the mixture of factor analyzers model, the mixture of variance-gamma distributions, and the mixture of variance-gamma factor analyzers models. Other short term objectives include the development of a mixture of multiple scaled variance-gamma distributions. This will bring a phenomenal level of modelling flexibility, while also guaranteeing cluster convexity -- the resulting components are hypercubiods so that the rate of decay can differ in each dimension. The mixture of multiple scaled variance-gamma distributions model will be extended to the mixture of multiple scaled variance-gamma factor analyzers model, for application to high-dimensional data. EAs will then be developed for the mixture of multiple scaled variance-gamma distributions and mixture of multiple scaled variance-gamma factor analyzers models and investigated as alternatives to alternating expectation-conditional maximization algorithms.
大数据是现代数据和统计分析的重要问题。计算机可以存储大量的数据;然而,准确和快速分析数据的方法并没有跟上现代存储技术的进步。在某些情况下,数据未经分析就被丢弃。大数据统计分析的改进将有利于处理大量数据的任何领域,例如生物科学(例如,基因组学)、金融和信息学、天文学、宇宙学和气候科学。
我提出的研究将利用一种称为进化计算(EC)的计算机编程形式。EC使用的技术是从自然选择的生物进化理论中复制而来的。在生物学中,目标通常是产生尽可能多的健康后代,这些后代继续产生自己的健康后代。基因组的随机突变会使一些孩子比他们的父母更健康,或者更不健康。更健康的孩子更有可能生下健康的后代,因此他们的基因会遗传下去。在我的研究中,“适应度”的衡量标准是算法在聚类大数据方面搜索最佳解决方案的能力。(聚类涉及考虑链接数据点的底层结构,以便它们可以被放入正确的组中,或正确地标记,例如,将基因表达与癌症类型联系起来)。交叉和变异等技术是从生物学中复制而来的,用于“进化”算法,使其每次运行都更适合。
根据所提出的研究,进化算法(EA)将开发,作为替代几乎无处不在的期望最大化(EM)算法及其变种,高斯和非高斯混合模型为基础的聚类方法。将为混合因子分析器模型、混合方差伽马分布和混合方差伽马因子分析器模型开发EA。其他短期目标包括开发多个缩放方差伽马分布的混合物。这将带来惊人的建模灵活性,同时也保证了集群凸性-产生的组件是超立方体,因此衰减速率可以在每个维度上不同。将多尺度方差-伽玛分布混合模型推广到多尺度方差-伽玛因子分析器混合模型,以应用于高维数据。然后,将开发EA的混合物的多个缩放方差伽马分布和混合物的多个缩放方差伽马因子分析模型和研究交替的期望条件最大化算法的替代方案。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
McNicholas, Sharon其他文献
McNicholas, Sharon的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('McNicholas, Sharon', 18)}}的其他基金
Parameter Estimation for Non-Gaussian Model-Based Clustering with High-Dimensional Data
基于非高斯模型的高维数据聚类参数估计
- 批准号:
RGPIN-2017-05258 - 财政年份:2022
- 资助金额:
$ 1.02万 - 项目类别:
Discovery Grants Program - Individual
Parameter Estimation for Non-Gaussian Model-Based Clustering with High-Dimensional Data
基于非高斯模型的高维数据聚类参数估计
- 批准号:
RGPIN-2017-05258 - 财政年份:2021
- 资助金额:
$ 1.02万 - 项目类别:
Discovery Grants Program - Individual
Parameter Estimation for Non-Gaussian Model-Based Clustering with High-Dimensional Data
基于非高斯模型的高维数据聚类参数估计
- 批准号:
RGPIN-2017-05258 - 财政年份:2018
- 资助金额:
$ 1.02万 - 项目类别:
Discovery Grants Program - Individual
Parameter Estimation for Non-Gaussian Model-Based Clustering with High-Dimensional Data
基于非高斯模型的高维数据聚类参数估计
- 批准号:
RGPIN-2017-05258 - 财政年份:2017
- 资助金额:
$ 1.02万 - 项目类别:
Discovery Grants Program - Individual
A Design Theoretic Approach to Multi-Objective Evolutionary Optimization
多目标进化优化的设计理论方法
- 批准号:
425548-2012 - 财政年份:2014
- 资助金额:
$ 1.02万 - 项目类别:
Alexander Graham Bell Canada Graduate Scholarships - Doctoral
A Design Theoretic Approach to Multi-Objective Evolutionary Optimization
多目标进化优化的设计理论方法
- 批准号:
425548-2012 - 财政年份:2013
- 资助金额:
$ 1.02万 - 项目类别:
Alexander Graham Bell Canada Graduate Scholarships - Doctoral
A Design Theoretic Approach to Multi-Objective Evolutionary Optimization
多目标进化优化的设计理论方法
- 批准号:
425548-2012 - 财政年份:2012
- 资助金额:
$ 1.02万 - 项目类别:
Alexander Graham Bell Canada Graduate Scholarships - Doctoral
相似海外基金
A shape-constrained approach for non-parametric variance estimation for Markov Chains
马尔可夫链非参数方差估计的形状约束方法
- 批准号:
2311141 - 财政年份:2023
- 资助金额:
$ 1.02万 - 项目类别:
Continuing Grant
Construction of a receptivity estimation model for risky utterance strategies in non-task-oriented conversational systems
非面向任务的会话系统中风险话语策略的接受度估计模型的构建
- 批准号:
23K16923 - 财政年份:2023
- 资助金额:
$ 1.02万 - 项目类别:
Grant-in-Aid for Early-Career Scientists
Progenitor Star Estimation of Non-thermal Dominated Core-Collapse Supernova Remnant Probed by High-resolution X-ray Spectroscopy
高分辨率 X 射线光谱探测非热主导核心塌陷超新星遗迹的祖星估计
- 批准号:
23KJ0296 - 财政年份:2023
- 资助金额:
$ 1.02万 - 项目类别:
Grant-in-Aid for JSPS Fellows
Non-Contact Sleep Stage Estimation: Machine Learning in Multi-Imbalance Data for Improvements in Accuracy and Interpretability
非接触式睡眠阶段估计:多重不平衡数据中的机器学习,以提高准确性和可解释性
- 批准号:
22KJ1367 - 财政年份:2023
- 资助金额:
$ 1.02万 - 项目类别:
Grant-in-Aid for JSPS Fellows
PFI-TT: Novel Non-Contacting Position Estimation System for Long-Stroke Actuators
PFI-TT:用于长行程执行器的新型非接触式位置估计系统
- 批准号:
2329798 - 财政年份:2023
- 资助金额:
$ 1.02万 - 项目类别:
Standard Grant
Non-parametric estimation under covariate shift: From fundamental bounds to efficient algorithms
协变量平移下的非参数估计:从基本界限到高效算法
- 批准号:
2311072 - 财政年份:2023
- 资助金额:
$ 1.02万 - 项目类别:
Standard Grant
Non-parametric identification, estimation and inference: generalized functions approach
非参数识别、估计和推理:广义函数方法
- 批准号:
RGPIN-2020-05444 - 财政年份:2022
- 资助金额:
$ 1.02万 - 项目类别:
Discovery Grants Program - Individual
Parameter Estimation for Non-Gaussian Model-Based Clustering with High-Dimensional Data
基于非高斯模型的高维数据聚类参数估计
- 批准号:
RGPIN-2017-05258 - 财政年份:2022
- 资助金额:
$ 1.02万 - 项目类别:
Discovery Grants Program - Individual
Continuous estimation of pulmonary artery pressure fluctuations based on non-invasive measurement using microwave radar
基于微波雷达无创测量的肺动脉压力波动连续估计
- 批准号:
22K12917 - 财政年份:2022
- 资助金额:
$ 1.02万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Non-destructive estimation of plant biomass using LiDAR data
使用激光雷达数据无损估算植物生物量
- 批准号:
574895-2022 - 财政年份:2022
- 资助金额:
$ 1.02万 - 项目类别:
University Undergraduate Student Research Awards