RUI: A Family of Versatile Mixture Models for Analyzing Mixed-Type Data with Asymmetry, Outliers, and Missing Values

RUI:一系列多功能混合模型,用于分析具有不对称性、离群值和缺失值的混合类型数据

基本信息

  • 批准号:
    2209974
  • 负责人:
  • 金额:
    $ 15万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2022
  • 资助国家:
    美国
  • 起止时间:
    2022-07-15 至 2025-06-30
  • 项目状态:
    未结题

项目摘要

Cluster analysis aims to discover patterns and homogeneity in the data. It reveals subgroups in a population of study, and it has applications in many fields. For example, in psychology, the clusters can be groups of patients that can benefit from a specific set of treatments. To apply cluster analysis to a data set, the data need to have some characteristics; for example, some techniques require the data to be continuous, and oftentimes they need pre-treatments. This project will develop a series of new clustering techniques that are suitable for challenging data sets without pre-treatments, such as those with high dimension, missing values, non-continuous variables, or with outliers. Novel statistical approaches and software packages will be produced and made available to general users. Undergraduate students will be directly involved in the research project, and together with graduate students, they will be trained to conduct research in data analysis. Many more students will be involved through class projects and the research outcomes will enrich the content of some of the offered courses. A widely used approach for cluster analysis is model-based clustering. It assumes that a population is a mixture of subpopulations, each of which can be represented by a density function. A variety of clustering methods and algorithms exist; however, they still have a series of limitations. Outliers and missing data can impact the clustering results, the high number of parameters makes the techniques not usable on high-dimensional data sets. Moreover, many algorithms assume continuous data; and they are not readily adaptable to handle discrete, binary, categorical, or a mixture of continuous and categorical data types. This is a major limitation because, in many fields such as medicine, biology, marketing, and many others, the data have all those characteristics. In this project, new clustering techniques based on non-Gaussian model-based clustering will be developed that will circumvent existing limitations on cluster shape, outliers, missing data, dimension, and data type of current methods. The novel methods will improve the flexibility in detecting skewed clusters and in obtaining robustness when dealing with outliers and missing data. Implicit and explicit dimension reduction techniques will be used for dimension reduction and latent class models will be adopted to deal with mixed-type data. The project will include a study on the indices to select the number of clusters and a thorough comparison with existing methods on real and simulated data will be undertaken, giving the users a guideline on which model to use based on the goal and challenges in their data.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
聚类分析的目的是发现数据中的模式和同质性。它揭示了研究群体中的亚群,在许多领域都有应用。例如,在心理学中,集群可以是可以从一组特定治疗中受益的患者群体。要将聚类分析应用于数据集,数据需要具有一些特征;例如,某些技术要求数据是连续的,并且通常需要预处理。该项目将开发一系列新的聚类技术,适用于未经预处理的具有挑战性的数据集,例如高维,缺失值,非连续变量或离群值。将制作新的统计方法和软件包,并提供给一般用户。本科生将直接参与研究项目,并与研究生一起,他们将接受培训,进行数据分析研究。更多的学生将参与课堂项目,研究成果将丰富一些课程的内容。一种广泛使用的聚类分析方法是基于模型的聚类。它假设一个种群是一个子种群的混合体,每个子种群都可以用一个密度函数来表示。存在各种聚类方法和算法;然而,它们仍然具有一系列局限性。异常值和缺失数据会影响聚类结果,大量的参数使得该技术不能用于高维数据集。此外,许多算法假设连续数据;并且它们不容易适应于处理离散、二进制、分类或连续和分类数据类型的混合。这是一个主要的限制,因为在许多领域,如医学,生物学,市场营销和许多其他领域,数据具有所有这些特征。在这个项目中,将开发基于非高斯模型的聚类的新聚类技术,该技术将规避当前方法在聚类形状、离群值、缺失数据、维度和数据类型上的现有限制。新的方法将提高灵活性,在检测倾斜的集群,并在处理离群值和缺失数据时获得鲁棒性。将使用隐式和显式降维技术进行降维,并采用潜在类模型来处理混合类型数据。该项目将包括一项关于选择聚类数目的指数的研究,并将对真实的和模拟数据的现有方法进行彻底的比较,该奖项反映了NSF的法定使命,并通过使用基金会的知识价值和更广泛的影响审查进行评估,被认为值得支持的搜索.

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Cristina Tortora其他文献

Handling skewness and directional tails in model-based clustering
  • DOI:
    10.1007/s00362-025-01723-9
  • 发表时间:
    2025-07-04
  • 期刊:
  • 影响因子:
    1.100
  • 作者:
    Cristina Tortora;Antonio Punzo;Brian C. Franczak
  • 通讯作者:
    Brian C. Franczak
Clustering mixed-type data using a probabilistic distance algorithmspan class="inline-figure"img src="//ars.els-cdn.com/content/image/1-s2.0-S1568494622007530-fx999.jpg" width="19" height="22" //span
使用概率距离算法对混合类型数据进行聚类
  • DOI:
    10.1016/j.asoc.2022.109704
  • 发表时间:
    2022-11-01
  • 期刊:
  • 影响因子:
    6.600
  • 作者:
    Cristina Tortora;Francesco Palumbo
  • 通讯作者:
    Francesco Palumbo
A strange case of cyclic vomiting: When an haematological disease appear with gastrointestinal symptoms
  • DOI:
    10.1016/j.dld.2014.07.165
  • 发表时间:
    2014-09-30
  • 期刊:
  • 影响因子:
  • 作者:
    Emma Acampora;Rossella Turco;Laura Salvadori;Valentina Bruno;Cristina Tortora;Antonella Gambale;Achille Iolascon;Renata Auricchio;Luigi Greco
  • 通讯作者:
    Luigi Greco
FPDclustering: a comprehensive R package for probabilistic distance clustering based methods
  • DOI:
    10.1007/s00180-024-01490-5
  • 发表时间:
    2024-05-15
  • 期刊:
  • 影响因子:
    1.400
  • 作者:
    Cristina Tortora;Francesco Palumbo
  • 通讯作者:
    Francesco Palumbo

Cristina Tortora的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

相似国自然基金

水稻 OVATE Family Protein 8 (OsOFP8)基因的功能研究
  • 批准号:
    31671271
  • 批准年份:
    2016
  • 资助金额:
    62.0 万元
  • 项目类别:
    面上项目
del Pezzo曲面的family上的E_n向量丛
  • 批准号:
    11501201
  • 批准年份:
    2015
  • 资助金额:
    18.0 万元
  • 项目类别:
    青年科学基金项目
Pim family调控白血病细胞和造血微环境之间Cross Talk在急性髓系白血病中的作用
  • 批准号:
    81100330
  • 批准年份:
    2011
  • 资助金额:
    24.0 万元
  • 项目类别:
    青年科学基金项目

相似海外基金

Legitimacy and effective policing responses to domestic and family violence
对家庭暴力的合法性和有效的警务反应
  • 批准号:
    DP240102371
  • 财政年份:
    2024
  • 资助金额:
    $ 15万
  • 项目类别:
    Discovery Projects
Collaborative Research: Unlocking the evolutionary history of Schiedea (carnation family, Caryophyllaceae): rapid radiation of an endemic plant genus in the Hawaiian Islands
合作研究:解开石竹科(石竹科)石竹的进化史:夏威夷群岛特有植物属的快速辐射
  • 批准号:
    2426560
  • 财政年份:
    2024
  • 资助金额:
    $ 15万
  • 项目类别:
    Standard Grant
Are family firms in Japan resilient to economic shock? Digging further by family types, management strategies, and earnings quality.
日本的家族企业能否抵御经济冲击?
  • 批准号:
    24K00297
  • 财政年份:
    2024
  • 资助金额:
    $ 15万
  • 项目类别:
    Grant-in-Aid for Scientific Research (B)
Environmental, Social, Governance (ESG), Family Firm Structure and Main Bank Relationship: Evidences from Japan
环境、社会、治理(ESG)、家族企业结构和主要银行关系:来自日本的证据
  • 批准号:
    24K04937
  • 财政年份:
    2024
  • 资助金额:
    $ 15万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
A novel mechanism of menopause-induced endometrial cancer that develops through functional loss of the helicase family
更年期诱发子宫内膜癌的新机制是通过解旋酶家族功能丧失而发生的
  • 批准号:
    24K19688
  • 财政年份:
    2024
  • 资助金额:
    $ 15万
  • 项目类别:
    Grant-in-Aid for Early-Career Scientists
SeparateSpace: Leveraging generative AI to help separating families who are underserved or excluded by the way family law support is currently delivered.
SeparateSpace:利用生成式人工智能来帮助分离那些因目前提供家庭法支持的方式而服务不足或被排除在外的家庭。
  • 批准号:
    10100497
  • 财政年份:
    2024
  • 资助金额:
    $ 15万
  • 项目类别:
    Collaborative R&D
Enhancing Wahkohtowin (Kinship beyond the immediate family) Community-based models of care to reach and support Indigenous and racialized women of reproductive age and pregnant women in Canada for the prevention of congenital syphilis
加强 Wahkohtowin(直系亲属以外的亲属关系)以社区为基础的护理模式,以接触和支持加拿大的土著和种族育龄妇女以及孕妇,预防先天梅毒
  • 批准号:
    502786
  • 财政年份:
    2024
  • 资助金额:
    $ 15万
  • 项目类别:
    Directed Grant
Building capacity, fostering community, and supporting champions to implement meaningful family engagement in child health research and practice: Development and evaluation of the Family Engagement in Research Champions Community of Practice (FER Champion
建设能力、培育社区并支持倡导者在儿童健康研究和实践中实施有意义的家庭参与:制定和评估家庭参与研究倡导者实践社区(FER Champion
  • 批准号:
    484903
  • 财政年份:
    2024
  • 资助金额:
    $ 15万
  • 项目类别:
    Salary Programs
Family Corporations in the Courts: A Comparative Study
法庭上的家族企业:比较研究
  • 批准号:
    24K04641
  • 财政年份:
    2024
  • 资助金额:
    $ 15万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Combining structural biology and genetics to understand the function of a multi-gene family expanded in neglected human malaria parasites
结合结构生物学和遗传学来了解在被忽视的人类疟疾寄生虫中扩展的多基因家族的功能
  • 批准号:
    MR/Y012895/1
  • 财政年份:
    2024
  • 资助金额:
    $ 15万
  • 项目类别:
    Research Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了