Collaborative Research: Development of Classification Theory and Methods for Objective Asymmetry, Sample Size Limitation, Labeling Ambiguity, and Feature Importance

合作研究:针对客观不对称性、样本量限制、标签歧义和特征重要性的分类理论和方法的发展

基本信息

  • 批准号:
    2113754
  • 负责人:
  • 金额:
    $ 12万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2021
  • 资助国家:
    美国
  • 起止时间:
    2021-07-01 至 2024-06-30
  • 项目状态:
    已结题

项目摘要

Classification is a popular data analytical technique in disciplines ranging from biomedical sciences to information technologies. This project will develop theory-backed statistical methods and algorithms to address pressing challenges in the application of classification. These challenges are related to imperfect aspects of training data, which are widespread in high-stake applications such as disease diagnosis and cybersecurity. In particular, this project will focus on the so-called asymmetric classification problems where a particular class is of greater importance than other classes, and the methods and algorithms will aim to control the classification error of missing the most important class in the population, not just in a particular dataset. This property will make the methods and algorithms powerful for medical diagnosis, for which the primary goal is diagnosis accuracy in the population. Moreover, this project will provide a suite of projects, ranging from theory to applications, that are suitable for training graduate and undergraduate students. The interdisciplinary nature of this project is expected to attract students from diverse background to join the PIs’ efforts.The PIs will develop a suite of application-driven, theory-backed methods and algorithms to address pressing data challenges including sample size limitations, sampling biases, and ambiguous class labels. The development will be primarily under the Neyman-Pearson (NP) classification paradigm, which was designed to control the population-level false-negative rate (p-FNR) under a desired level while minimizing the population-level false-positive rate (p-FPR). This project will integrate the NP classification into cutting-edge statistical learning tasks and enable it to address the aforementioned real-world data challenges. Specifically, this project will include the following four overarching goals. First, the PIs will use random matrix theory to address a long-standing problem in the NP classification methodology: whether NP classifiers can be constructed without a sample-splitting step to improve data efficiency. Second, because the NP paradigm has an invariance property to sampling bias, the PIs will develop NP classifiers to address the sampling bias issue in biomedical applications. These classifiers can be trained on biased samples but still achieve the p-FNR control. Third, the PIs will develop a model-free feature ranking framework to incorporate multiple classification paradigms including the NP paradigm and to reflect prediction objectives. Fourth, the PIs will develop the first NP umbrella algorithm under the label noise setting and the first information-theoretic criteria that combine ambiguous classes in multi-class classification. To disseminate the project outcomes, the PIs will give research talks, organize conference sessions, share open-source software packages with tutorials, and reach out to practitioners of classification methods.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
分类是从生物医学科学到信息技术等学科中流行的数据分析技术。该项目将开发理论支持的统计方法和算法,以解决分类应用中的紧迫挑战。这些挑战与训练数据的不完美方面有关,这些方面在疾病诊断和网络安全等高风险应用中广泛存在。特别是,该项目将专注于所谓的非对称分类问题,其中特定类别比其他类别更重要,并且方法和算法旨在控制在人群中错过最重要类别的分类错误,而不仅仅是在特定数据集中。该属性将使方法和算法对于医学诊断是强大的,对于医学诊断,主要目标是群体中的诊断准确性。此外,该项目将提供一套项目,从理论到应用,适合培养研究生和本科生。该项目的跨学科性质预计将吸引来自不同背景的学生加入PI的努力。PI将开发一套应用驱动,理论支持的方法和算法,以解决紧迫的数据挑战,包括样本大小限制,抽样偏差和模糊的类别标签。开发将主要在Neyman-Pearson(NP)分类范式下进行,该范式旨在将群体水平假阴性率(p-FNR)控制在所需水平下,同时最大限度地降低群体水平假阳性率(p-FPR)。该项目将把NP分类集成到尖端的统计学习任务中,使其能够应对上述现实世界的数据挑战。具体而言,该项目将包括以下四个总体目标。首先,PI将使用随机矩阵理论来解决NP分类方法中的一个长期存在的问题:NP分类器是否可以在没有样本分裂步骤的情况下构建,以提高数据效率。第二,由于NP范式对采样偏差具有不变性,PI将开发NP分类器来解决生物医学应用中的采样偏差问题。这些分类器可以在有偏差的样本上训练,但仍然可以实现p-FNR控制。第三,PI将开发一个无模型特征排名框架,以整合包括NP范式在内的多种分类范式并反映预测目标。第四,PI将在标签噪声设置下开发第一个NP伞形算法,并在多类分类中开发第一个联合收割机组合模糊类的信息论标准。为了传播项目成果,PI将举办研究讲座,组织会议,分享开源软件包和教程,并接触分类方法的从业者。该奖项反映了NSF的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。

项目成果

期刊论文数量(3)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Asymmetric Error Control Under Imperfect Supervision: A Label-Noise-Adjusted Neyman–Pearson Umbrella Algorithm
  • DOI:
    10.1080/01621459.2021.2016423
  • 发表时间:
    2021-12
  • 期刊:
  • 影响因子:
    3.7
  • 作者:
    Shu Yao;Bradley Rava;Xin Tong;Gareth M. James
  • 通讯作者:
    Shu Yao;Bradley Rava;Xin Tong;Gareth M. James
Information-theoretic Classification Accuracy: A Criterion that Guides Data-driven Combination of Ambiguous Outcome Labels in Multi-class Classification
  • DOI:
  • 发表时间:
    2021-09
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Chihao Zhang;Y. Chen;Shihua Zhang;Jingyi Jessica Li
  • 通讯作者:
    Chihao Zhang;Y. Chen;Shihua Zhang;Jingyi Jessica Li
A flexible model-free prediction-based framework for feature ranking.
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Jingyi Jessica Li其他文献

Systematic evaluation of methylation-based cell type deconvolution methods for plasma cell-free DNA
  • DOI:
    10.1186/s13059-024-03456-8
  • 发表时间:
    2024-12-19
  • 期刊:
  • 影响因子:
    9.400
  • 作者:
    Tongyue Sun;Jinqi Yuan;Yacheng Zhu;Jingqi Li;Shen Yang;Junpeng Zhou;Xinzhou Ge;Susu Qu;Wei Li;Jingyi Jessica Li;Yumei Li
  • 通讯作者:
    Yumei Li
Integrated molecular and functional characterization of the intrinsic apoptotic machinery identifies therapeutic vulnerabilities in glioma
内在凋亡机制的综合分子和功能表征确定了神经胶质瘤中的治疗弱点
  • DOI:
    10.1038/s41467-024-54138-9
  • 发表时间:
    2024-11-21
  • 期刊:
  • 影响因子:
    15.700
  • 作者:
    Elizabeth G. Fernandez;Wilson X. Mai;Kai Song;Nicholas A. Bayley;Jiyoon Kim;Henan Zhu;Marissa Pioso;Pauline Young;Cassidy L. Andrasz;Dimitri Cadet;Linda M. Liau;Gang Li;William H. Yong;Fausto J. Rodriguez;Scott J. Dixon;Andrew J. Souers;Jingyi Jessica Li;Thomas G. Graeber;Timothy F. Cloughesy;David A. Nathanson
  • 通讯作者:
    David A. Nathanson
Information-theoretic Classification Accuracy: A Data-driven Criterion to Combining Ambiguous Outcome Labels in Multi-class Classification
信息论分类准确性:在多类分类中组合模糊结果标签的数据驱动标准
  • DOI:
  • 发表时间:
    2021
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Chihao Zhang;Y. Chen;Shihua Zhang;Jingyi Jessica Li
  • 通讯作者:
    Jingyi Jessica Li
Publisher Correction: scDesign2: a transparent simulator that generates high-fidelity single-cell gene expression count data with gene correlations captured
  • DOI:
    10.1186/s13059-021-02394-z
  • 发表时间:
    2021-06-09
  • 期刊:
  • 影响因子:
    9.400
  • 作者:
    Tianyi Sun;Dongyuan Song;Wei Vivian Li;Jingyi Jessica Li
  • 通讯作者:
    Jingyi Jessica Li
A BOOTSTRAP LASSO + PARTIAL RIDGE METHOD TO CONSTRUCT CONFIDENCE INTERVALS FOR PARAMETERS IN HIGH-DIMENSIONAL SPARSE LINEAR MODELS
一种构建高维稀疏线性模型参数置信区间的Bootstrap Lasso偏岭法
  • DOI:
  • 发表时间:
    2020
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Hanzhong Liu;Xin Xu;Jingyi Jessica Li
  • 通讯作者:
    Jingyi Jessica Li

Jingyi Jessica Li的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Jingyi Jessica Li', 18)}}的其他基金

CAREER: Advancing the Bioinformatic Infrastructure and Methodology for Single-cell RNA Sequencing
职业:推进单细胞 RNA 测序的生物信息学基础设施和方法
  • 批准号:
    1846216
  • 财政年份:
    2019
  • 资助金额:
    $ 12万
  • 项目类别:
    Continuing Grant
QuBBD: Collaborative Research: Advancing mHealth using Big Data Analytics: Statistical and Dynamical Systems Modeling of Real-Time Adaptive m-Intervention for Pain
QuBBD:协作研究:利用大数据分析推进移动医疗:疼痛实时自适应移动干预的统计和动态系统建模
  • 批准号:
    1557727
  • 财政年份:
    2015
  • 资助金额:
    $ 12万
  • 项目类别:
    Standard Grant

相似国自然基金

Research on Quantum Field Theory without a Lagrangian Description
  • 批准号:
    24ZR1403900
  • 批准年份:
    2024
  • 资助金额:
    0.0 万元
  • 项目类别:
    省市级项目
Cell Research
  • 批准号:
    31224802
  • 批准年份:
    2012
  • 资助金额:
    24.0 万元
  • 项目类别:
    专项基金项目
Cell Research
  • 批准号:
    31024804
  • 批准年份:
    2010
  • 资助金额:
    24.0 万元
  • 项目类别:
    专项基金项目
Cell Research (细胞研究)
  • 批准号:
    30824808
  • 批准年份:
    2008
  • 资助金额:
    24.0 万元
  • 项目类别:
    专项基金项目
Research on the Rapid Growth Mechanism of KDP Crystal
  • 批准号:
    10774081
  • 批准年份:
    2007
  • 资助金额:
    45.0 万元
  • 项目类别:
    面上项目

相似海外基金

Collaborative Research: RESEARCH-PGR: Development of epigenetic editing for crop improvement
合作研究:RESEARCH-PGR:用于作物改良的表观遗传编辑的开发
  • 批准号:
    2331437
  • 财政年份:
    2024
  • 资助金额:
    $ 12万
  • 项目类别:
    Standard Grant
Collaborative Research: Broadening Instructional Innovation in the Chemistry Laboratory through Excellence in Curriculum Development
合作研究:通过卓越的课程开发扩大化学实验室的教学创新
  • 批准号:
    2337028
  • 财政年份:
    2024
  • 资助金额:
    $ 12万
  • 项目类别:
    Continuing Grant
Collaborative Research: CAS: Exploration and Development of High Performance Thiazolothiazole Photocatalysts for Innovating Light-Driven Organic Transformations
合作研究:CAS:探索和开发高性能噻唑并噻唑光催化剂以创新光驱动有机转化
  • 批准号:
    2400166
  • 财政年份:
    2024
  • 资助金额:
    $ 12万
  • 项目类别:
    Continuing Grant
Collaborative Research: Broadening Instructional Innovation in the Chemistry Laboratory through Excellence in Curriculum Development
合作研究:通过卓越的课程开发扩大化学实验室的教学创新
  • 批准号:
    2337027
  • 财政年份:
    2024
  • 资助金额:
    $ 12万
  • 项目类别:
    Continuing Grant
Collaborative Research: RESEARCH-PGR: Development of epigenetic editing for crop improvement
合作研究:RESEARCH-PGR:用于作物改良的表观遗传编辑的开发
  • 批准号:
    2331438
  • 财政年份:
    2024
  • 资助金额:
    $ 12万
  • 项目类别:
    Standard Grant
Collaborative Research: A Multi-Lab Investigation of the Conceptual Foundations of Early Number Development
合作研究:早期数字发展概念基础的多实验室调查
  • 批准号:
    2405548
  • 财政年份:
    2024
  • 资助金额:
    $ 12万
  • 项目类别:
    Standard Grant
Collaborative Research: CAS: Exploration and Development of High Performance Thiazolothiazole Photocatalysts for Innovating Light-Driven Organic Transformations
合作研究:CAS:探索和开发高性能噻唑并噻唑光催化剂以创新光驱动有机转化
  • 批准号:
    2400165
  • 财政年份:
    2024
  • 资助金额:
    $ 12万
  • 项目类别:
    Continuing Grant
Collaborative Research: HNDS-I. Mobility Data for Communities (MD4C): Uncovering Segregation, Climate Resilience, and Economic Development from Cell-Phone Records
合作研究:HNDS-I。
  • 批准号:
    2420945
  • 财政年份:
    2024
  • 资助金额:
    $ 12万
  • 项目类别:
    Standard Grant
SBP: Collaborative Research: Improving Engagement with Professional Development Programs by Attending to Teachers' Psychosocial Experiences
SBP:协作研究:通过关注教师的社会心理体验来提高对专业发展计划的参与度
  • 批准号:
    2314254
  • 财政年份:
    2023
  • 资助金额:
    $ 12万
  • 项目类别:
    Standard Grant
Collaborative Research: Frameworks: FZ: A fine-tunable cyberinfrastructure framework to streamline specialized lossy compression development
合作研究:框架:FZ:一个可微调的网络基础设施框架,用于简化专门的有损压缩开发
  • 批准号:
    2311878
  • 财政年份:
    2023
  • 资助金额:
    $ 12万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了