Scalable Methods for Classification of Heterogeneous High-Dimensional Data
异构高维数据分类的可扩展方法
基本信息
- 批准号:1712943
- 负责人:
- 金额:$ 16.25万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2017
- 资助国家:美国
- 起止时间:2017-07-01 至 2020-06-30
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Recent technological advances have enabled routine collection of large-scale high-dimensional data in the biomedical fields. For example, in cancer research it is common to use multiple high-throughput technology platforms to measure genotype, gene expression levels, and methylation levels. One of the main challenges in the analysis of such data is the identification of key biological measurements that can be used to classify the subject into a known cancer subtype. While significant progress has been made in the development of computationally efficient classification methods to address this challenge, existing methods do not adequately take into account the heterogeneity across the cancer subtypes and the mixed types of measurements (binary/count/continuous) across technology platforms. As such, existing methods may fail to identify relevant biological patterns. The goal of this project is to develop new classification methods that explicitly take into account the type and heterogeneity of measurements. While the primary focus is on methodology, high priority will be given to computational considerations and software development to encourage dissemination and ensure ease of use for domain scientists. Regularized linear discriminant methods are commonly used for simultaneous classification and variable selection due to their interpretability and computational efficiency. These methods, however, rely on unrealistic assumptions of equality of group-covariance matrices and normality of measurements. This project aims to address the limitations present in current discriminant approaches, and has three objectives: (1) to develop computationally efficient quadratic classification rules that perform variable selection; (2) to generalize the discriminant analysis framework to non-normal measurements; (3) to develop a classification framework for mixed type data coming from multiple technology platforms collected on the same set of subjects. The key methodological innovation is the combination of sparse low-rank singular value decomposition, which enables computational efficiency, with geometric interpretation of linear discriminant analysis, which allows for the construction of nonlinear classification rules by redefining the space for discrimination.
最近的技术进步使生物医学领域的大规模高维数据的常规收集成为可能。例如,在癌症研究中,通常使用多个高通量技术平台来测量基因型、基因表达水平和甲基化水平。分析此类数据的主要挑战之一是确定可用于将受试者分类为已知癌症亚型的关键生物学测量值。虽然在开发计算效率高的分类方法以应对这一挑战方面取得了重大进展,但现有方法没有充分考虑到癌症亚型之间的异质性以及跨技术平台的混合测量类型(二进制/计数/连续)。因此,现有的方法可能无法识别相关的生物模式。该项目的目标是开发新的分类方法,明确考虑到测量的类型和异质性。虽然主要的焦点是方法论,但将高度优先考虑计算方面的考虑和软件开发,以鼓励传播并确保领域科学家易于使用。正则化线性判别方法由于其可解释性和计算效率高,被广泛用于同时分类和变量选择。然而,这些方法依赖于群协方差矩阵相等和测量正态性的不切实际的假设。该项目旨在解决当前判别方法存在的局限性,并有三个目标:(1)开发执行变量选择的计算效率高的二次分类规则;(2)将判别分析框架推广到非正态测量;(3)针对同一主题集采集的来自多个技术平台的混合型数据开发分类框架。关键的方法创新是将稀疏低秩奇异值分解与线性判别分析的几何解释相结合,前者提高了计算效率,后者允许通过重新定义判别空间来构建非线性分类规则。
项目成果
期刊论文数量(7)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Double-Matched Matrix Decomposition for Multi-View Data
- DOI:10.1080/10618600.2022.2067860
- 发表时间:2021-05
- 期刊:
- 影响因子:2.4
- 作者:Dongbang Yuan;Irina Gaynanova
- 通讯作者:Dongbang Yuan;Irina Gaynanova
Sparse semiparametric canonical correlation analysis for data of mixed types.
- DOI:10.1093/biomet/asaa007
- 发表时间:2020-09
- 期刊:
- 影响因子:2.7
- 作者:Yoon G;Carroll RJ;Gaynanova I
- 通讯作者:Gaynanova I
Microbial Networks in SPRING - Semi-parametric Rank-Based Correlation and Partial Correlation Estimation for Quantitative Microbiome Data
- DOI:10.3389/fgene.2019.00516
- 发表时间:2019-06-06
- 期刊:
- 影响因子:3.7
- 作者:Yoon, Grace;Gaynanova, Irina;Mueller, Christian L.
- 通讯作者:Mueller, Christian L.
Prediction and estimation consistency of sparse multi-class penalized optimal scoring
- DOI:10.3150/19-bej1126
- 发表时间:2018-09
- 期刊:
- 影响因子:1.5
- 作者:Irina Gaynanova
- 通讯作者:Irina Gaynanova
Sparse feature selection in kernel discriminant analysis via optimal scoring
通过最佳评分进行核判别分析中的稀疏特征选择
- DOI:
- 发表时间:2019
- 期刊:
- 影响因子:0
- 作者:Lapanowski, Alexander F.;Gaynanova, Irina
- 通讯作者:Gaynanova, Irina
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Irina Gaynanova其他文献
Sparse semiparametric discriminant analysis for high-dimensional zero-inflated data
高维零膨胀数据的稀疏半参数判别分析
- DOI:
- 发表时间:
2022 - 期刊:
- 影响因子:0
- 作者:
Hee Cheol Chung;Yang Ni;Irina Gaynanova - 通讯作者:
Irina Gaynanova
Optimal variable selection in multi-group sparse discriminant analysis ∗
多组稀疏判别分析中的最优变量选择*
- DOI:
10.1214/15-ejs1064 - 发表时间:
2014 - 期刊:
- 影响因子:1.1
- 作者:
Irina Gaynanova;M. Kolar - 通讯作者:
M. Kolar
Corrections of Equations on Glycemic Variability and Quality of Glycemic Control.
血糖变异性方程的修正和血糖控制的质量。
- DOI:
- 发表时间:
2018 - 期刊:
- 影响因子:5.4
- 作者:
Irina Gaynanova;Jacek K. Urbanek;N. Punjabi - 通讯作者:
N. Punjabi
Prediction error bounds for linear regression with the TREX
使用 TREX 进行线性回归的预测误差范围
- DOI:
10.1007/s11749-018-0584-4 - 发表时间:
2018 - 期刊:
- 影响因子:1.3
- 作者:
J. Bien;Irina Gaynanova;Johannes Lederer;Christian L. Müller - 通讯作者:
Christian L. Müller
Irina Gaynanova的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Irina Gaynanova', 18)}}的其他基金
CAREER: Next-Generation Methods for Statistical Integration of High-Dimensional Disparate Data Sources
职业:高维不同数据源统计集成的下一代方法
- 批准号:
2422478 - 财政年份:2024
- 资助金额:
$ 16.25万 - 项目类别:
Continuing Grant
CAREER: Next-Generation Methods for Statistical Integration of High-Dimensional Disparate Data Sources
职业:高维不同数据源统计集成的下一代方法
- 批准号:
2044823 - 财政年份:2021
- 资助金额:
$ 16.25万 - 项目类别:
Continuing Grant
相似国自然基金
Computational Methods for Analyzing Toponome Data
- 批准号:60601030
- 批准年份:2006
- 资助金额:17.0 万元
- 项目类别:青年科学基金项目
相似海外基金
IMR: MM-1C: Enabling Continual Passive Estimation of Performance of Internet Transfers: Online Measurement and Classification Methods
IMR:MM-1C:实现互联网传输性能的持续被动估计:在线测量和分类方法
- 批准号:
2319511 - 财政年份:2023
- 资助金额:
$ 16.25万 - 项目类别:
Standard Grant
Classification of learners based on explicit and implicit shyness and an examination of appropriate learning environments and support methods
根据显性和隐性害羞对学习者进行分类,并检查适当的学习环境和支持方法
- 批准号:
23K02874 - 财政年份:2023
- 资助金额:
$ 16.25万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Classification and generation methods for socially acceptable robot voice
社会可接受的机器人语音的分类和生成方法
- 批准号:
573710-2022 - 财政年份:2022
- 资助金额:
$ 16.25万 - 项目类别:
University Undergraduate Student Research Awards
Collaborative Research: Development of Classification Theory and Methods for Objective Asymmetry, Sample Size Limitation, Labeling Ambiguity, and Feature Importance
合作研究:针对客观不对称性、样本量限制、标签歧义和特征重要性的分类理论和方法的发展
- 批准号:
2113500 - 财政年份:2021
- 资助金额:
$ 16.25万 - 项目类别:
Standard Grant
Classification of late-onset psychosis and verification of effective treatment methods
迟发性精神病的分类及有效治疗方法的验证
- 批准号:
21K15730 - 财政年份:2021
- 资助金额:
$ 16.25万 - 项目类别:
Grant-in-Aid for Early-Career Scientists
Collaborative Research: Development of Classification Theory and Methods for Objective Asymmetry, Sample Size Limitation, Labeling Ambiguity, and Feature Importance
合作研究:针对客观不对称性、样本量限制、标签歧义和特征重要性的分类理论和方法的发展
- 批准号:
2113754 - 财政年份:2021
- 资助金额:
$ 16.25万 - 项目类别:
Standard Grant
Which mental health classification serves Canada best? A mixed methods, multicenter comparison of the World Health Organization's ICD-11 Clinical Descriptions and Diagnostic Guidelines with the American Psychiatric Association's DSM-5.
哪种心理健康分类最适合加拿大?
- 批准号:
456724 - 财政年份:2021
- 资助金额:
$ 16.25万 - 项目类别:
Operating Grants
Building evidence for adopting a new disease classification system in Canadian primary care settings: a mixed methods feasibility study
为在加拿大初级保健机构采用新的疾病分类系统建立证据:混合方法可行性研究
- 批准号:
451333 - 财政年份:2021
- 资助金额:
$ 16.25万 - 项目类别:
Operating Grants
Machine Learning Methods to Re-annotate Histone Modifications with Locus-specific Functional Classification
使用位点特异性功能分类重新注释组蛋白修饰的机器学习方法
- 批准号:
MR/T022620/1 - 财政年份:2020
- 资助金额:
$ 16.25万 - 项目类别:
Fellowship
Classification methods taking account clarity of characteristics in biological sounds.
考虑生物声音特征清晰度的分类方法。
- 批准号:
20K12045 - 财政年份:2020
- 资助金额:
$ 16.25万 - 项目类别:
Grant-in-Aid for Scientific Research (C)