Parameter Estimation for Non-Gaussian Model-Based Clustering with High-Dimensional Data

基于非高斯模型的高维数据聚类参数估计

基本信息

  • 批准号:
    RGPIN-2017-05258
  • 负责人:
  • 金额:
    $ 1.02万
  • 依托单位:
  • 依托单位国家:
    加拿大
  • 项目类别:
    Discovery Grants Program - Individual
  • 财政年份:
    2019
  • 资助国家:
    加拿大
  • 起止时间:
    2019-01-01 至 2020-12-31
  • 项目状态:
    已结题

项目摘要

Big data is an important issue for modern data and statistical analysis. Computers can store huge amounts of data; however, methods to accurately and quickly analyze the data have not kept pace with improvements to modern storage technology. In some cases, data are discarded without being analyzed. Improved statistical analysis of big data will benefit any field dealing with massive amounts of data, such as biological sciences (e.g., genomics), finance and informatics, astronomy, cosmology, and climate science.******My proposed research will utilize a form of computer programming called Evolutionary Computation (EC). EC uses techniques copied from the biological theory of evolution by natural selection. In biology, the goal usually is to produce as many fit offspring as possible, who go on to produce their own fit offspring. Random mutations to the genome will make some children more fit, or less fit, than their parents. The fitter children are more likely to produce healthy offspring, so their genes get passed on. For my research, the measure of "fitness" used is how well the algorithm searches for optimum solutions with regard to clustering big data. (Clustering involves accounting for the underlying structure that links data points, so that they can be put into correct groups, or labelled correctly, e.g., linking gene expression to types of cancer.) Techniques such as cross-over and mutation are copied from biology, and are used to "evolve" the algorithm and make it fitter each time it runs. ******Under the proposed research, evolutionary algorithms (EAs) will be developed, as alternatives to the almost ubiquitous expectation-maximization (EM) algorithm and its variants, for Gaussian and non-Gaussian mixture model-based approaches to clustering. EAs will be developed for the mixture of factor analyzers model, the mixture of variance-gamma distributions, and the mixture of variance-gamma factor analyzers models. Other short term objectives include the development of a mixture of multiple scaled variance-gamma distributions. This will bring a phenomenal level of modelling flexibility, while also guaranteeing cluster convexity -- the resulting components are hypercubiods so that the rate of decay can differ in each dimension. The mixture of multiple scaled variance-gamma distributions model will be extended to the mixture of multiple scaled variance-gamma factor analyzers model, for application to high-dimensional data. EAs will then be developed for the mixture of multiple scaled variance-gamma distributions and mixture of multiple scaled variance-gamma factor analyzers models and investigated as alternatives to alternating expectation-conditional maximization algorithms.
大数据是现代数据和统计分析的一个重要问题。计算机可以存储海量数据;然而,准确和快速分析数据的方法没有跟上现代存储技术的进步。在某些情况下,数据在没有经过分析的情况下被丢弃。改进的大数据统计分析将有利于任何处理海量数据的领域,如生物科学(例如基因组学)、金融和信息学、天文学、宇宙学和气候科学。*我提议的研究将利用一种名为进化计算(EC)的计算机编程形式。欧共体使用的技术复制自自然选择进化的生物学理论。在生物学上,目标通常是产生尽可能多的合适的后代,这些后代接着产生他们自己的合适的后代。基因组的随机突变会让一些孩子比他们的父母更健康,或者更不健康。身体健康的孩子更有可能生出健康的后代,所以他们的基因会遗传下去。在我的研究中,“适合度”的衡量标准是算法在大数据集群方面搜索最优解决方案的能力。(分类包括考虑连接数据点的底层结构,以便它们可以被归入正确的组,或正确地标记,例如,将基因表达与癌症类型联系起来。)交叉和变异等技术是从生物学复制而来的,并被用来“进化”算法,使其每次运行时都更适合。*在拟议的研究下,将开发进化算法(EA),作为几乎无处不在的期望最大化(EM)算法及其变体的替代方案,用于基于高斯和非高斯混合模型的聚类方法。将为混合因素分析模型、混合方差-伽马分布模型和混合方差-伽马因素分析模型开发EAS。其他短期目标包括开发多种比例方差-伽马分布的混合体。这将带来惊人的建模灵活性,同时也保证了集群的凸性--由此产生的组件是超立方体,因此衰减率在每个维度上可能不同。将混合多尺度方差-伽马分布模型推广到混合多尺度方差-伽马因子分析模型,应用于高维数据。然后,将针对多个标度方差-伽马分布的混合模型和多个标度方差-伽马因子分析模型的混合开发EAS,并将其作为交替期望条件最大化算法的替代算法进行研究。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Mcnicholas, Sharon其他文献

Mcnicholas, Sharon的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

相似海外基金

A shape-constrained approach for non-parametric variance estimation for Markov Chains
马尔可夫链非参数方差估计的形状约束方法
  • 批准号:
    2311141
  • 财政年份:
    2023
  • 资助金额:
    $ 1.02万
  • 项目类别:
    Continuing Grant
Construction of a receptivity estimation model for risky utterance strategies in non-task-oriented conversational systems
非面向任务的会话系统中风险话语策略的接受度估计模型的构建
  • 批准号:
    23K16923
  • 财政年份:
    2023
  • 资助金额:
    $ 1.02万
  • 项目类别:
    Grant-in-Aid for Early-Career Scientists
Progenitor Star Estimation of Non-thermal Dominated Core-Collapse Supernova Remnant Probed by High-resolution X-ray Spectroscopy
高分辨率 X 射线光谱探测非热主导核心塌陷超新星遗迹的祖星估计
  • 批准号:
    23KJ0296
  • 财政年份:
    2023
  • 资助金额:
    $ 1.02万
  • 项目类别:
    Grant-in-Aid for JSPS Fellows
Non-Contact Sleep Stage Estimation: Machine Learning in Multi-Imbalance Data for Improvements in Accuracy and Interpretability
非接触式睡眠阶段估计:多重不平衡数据中的机器学习,以提高准确性和可解释性
  • 批准号:
    22KJ1367
  • 财政年份:
    2023
  • 资助金额:
    $ 1.02万
  • 项目类别:
    Grant-in-Aid for JSPS Fellows
PFI-TT: Novel Non-Contacting Position Estimation System for Long-Stroke Actuators
PFI-TT:用于长行程执行器的新型非接触式位置估计系统
  • 批准号:
    2329798
  • 财政年份:
    2023
  • 资助金额:
    $ 1.02万
  • 项目类别:
    Standard Grant
Non-parametric estimation under covariate shift: From fundamental bounds to efficient algorithms
协变量平移下的非参数估计:从基本界限到高效算法
  • 批准号:
    2311072
  • 财政年份:
    2023
  • 资助金额:
    $ 1.02万
  • 项目类别:
    Standard Grant
Non-parametric identification, estimation and inference: generalized functions approach
非参数识别、估计和推理:广义函数方法
  • 批准号:
    RGPIN-2020-05444
  • 财政年份:
    2022
  • 资助金额:
    $ 1.02万
  • 项目类别:
    Discovery Grants Program - Individual
Parameter Estimation for Non-Gaussian Model-Based Clustering with High-Dimensional Data
基于非高斯模型的高维数据聚类参数估计
  • 批准号:
    RGPIN-2017-05258
  • 财政年份:
    2022
  • 资助金额:
    $ 1.02万
  • 项目类别:
    Discovery Grants Program - Individual
Continuous estimation of pulmonary artery pressure fluctuations based on non-invasive measurement using microwave radar
基于微波雷达无创测量的肺动脉压力波动连续估计
  • 批准号:
    22K12917
  • 财政年份:
    2022
  • 资助金额:
    $ 1.02万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Non-destructive estimation of plant biomass using LiDAR data
使用激光雷达数据无损估算植物生物量
  • 批准号:
    574895-2022
  • 财政年份:
    2022
  • 资助金额:
    $ 1.02万
  • 项目类别:
    University Undergraduate Student Research Awards
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了