Learning subgroups from data: selective inference and applications

从数据中学习子群:选择性推理和应用

基本信息

  • 批准号:
    RGPIN-2021-02548
  • 负责人:
  • 金额:
    $ 1.31万
  • 依托单位:
  • 依托单位国家:
    加拿大
  • 项目类别:
    Discovery Grants Program - Individual
  • 财政年份:
    2022
  • 资助国家:
    加拿大
  • 起止时间:
    2022-01-01 至 2023-12-31
  • 项目状态:
    已结题

项目摘要

"Subgroup learning" methods like clustering and regression trees are used to split large, complex data sets into smaller chunks ("subgroups") that are as homogeneous as possible. These methods have been used for important tasks like identifying subgroups of patients that respond differently to an experimental drug, identifying subgroups of tumours that have different biological profiles (leading to targeted treatment strategies), and identifying customers who might be more likely to purchase particular products (leading to targeted advertising strategies). In this proposal, I outline my plans to develop new statistical methodology to determine if identified subgroups in a data set are truly different, and to apply subgroup learning to solve a difficult problem in an area called space-filling designs.  Once we have identified subgroups in a data set, it is natural to want to know whether they are truly different. After all, if the subgroup learning method identified subgroups of tumours that all have the same biological profiles, or subgroups of customers who all have the same purchasing preferences, it would be a massive waste of time and resources to target treatment or advertising strategies to these subgroups. Unfortunately, existing classical statistical methods for testing whether two groups are different are overly optimistic and will almost always claim that the subgroups are different, because they do not account for the double use of the data: once to identify candidate subgroups, and once again to determine whether they are different. To solve this problem, the research program proposes statistical methods that properly account for the double use of the data, when testing for differences in means between subgroups identified using clustering and regression trees.  I will further highlight the untapped potential of subgroup learning methods in a statistical area called space-filling designs, which aims to evenly distribute points throughout a space. These designs are widely used in computer experiments, which use computer models to simulate the effect of input variables on physical systems like weather and sea ice. Space-filling designs are also used to design environmental monitoring networks, like air quality and underwater acoustic monitoring networks. Although spaces are often complex in practice (e.g. networks on coastlines), it is difficult to construct space-filling designs unless the space is simple. I propose a strategy based on an application of a subgroup learning method (hierarchical clustering) that can construct space-filling designs on arbitrarily complex spaces.  In this research program, graduate and undergraduate students will contribute to producing statistical methods that impact the rate of scientific advancement in areas like biology and public health, and become experts in subgroup learning, which will be a valuable asset to their future careers as statisticians or data scientists in academia and industry.
像聚类和回归树这样的“子组学习”方法用于将大型复杂数据集分割成尽可能同质的较小块(“子组”)。这些方法已被用于重要的任务,如识别对实验药物有不同反应的患者亚组,识别具有不同生物学特征的肿瘤亚组(导致有针对性的治疗策略),以及识别可能更有可能购买特定产品的客户(导致有针对性的广告策略)。在这份提案中,我概述了我的计划,即开发新的统计方法来确定数据集中确定的子组是否真的不同,并应用子组学习来解决空间填充设计领域的一个难题。毕竟,如果子组学习方法识别出具有相同生物学特征的肿瘤子组,或者具有相同购买偏好的客户子组,那么针对这些子组的治疗或广告策略将是巨大的时间和资源浪费。不幸的是,现有的检验两组是否不同的经典统计方法过于乐观,几乎总是声称亚组是不同的,因为它们没有考虑到数据的双重使用:一次识别候选子组,再一次确定它们是否不同。为了解决这个问题,该研究计划提出了适当考虑数据双重用途的统计方法,当测试使用聚类和回归树确定的亚组之间的平均值差异时,我将进一步强调统计领域的子群学习方法称为空间填充设计,其目的是在整个空间中均匀分布点。这些设计被广泛用于计算机实验,这些实验使用计算机模型来模拟输入变量对天气和海冰等物理系统的影响。空间填充设计也用于设计环境监测网络,如空气质量和水下声学监测网络。虽然空间在实践中往往是复杂的(例如海岸线上的网络),除非空间简单,否则很难构建空间填充设计。我提出了一种基于子群学习方法的应用策略(层次聚类),可以在任意复杂的空间上构建空间填充设计。在本研究计划中,研究生和本科生将有助于产生影响生物学和公共卫生等领域科学进步速度的统计方法,并成为亚组学习的专家,这将是他们未来在学术界和工业界担任统计学家或数据科学家的宝贵财富。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Gao, Lucy其他文献

Diminished viability of human ovarian cancer cells by antigen-specific delivery of carbon monoxide with a family of photoactivatable antibody-photoCORM conjugates
  • DOI:
    10.1039/c9sc03166a
  • 发表时间:
    2020-01-14
  • 期刊:
  • 影响因子:
    8.4
  • 作者:
    Kawahara, Brian;Gao, Lucy;Mascharak, Pradip K.
  • 通讯作者:
    Mascharak, Pradip K.
Extracellular Vesicles From Auditory Cells as Nanocarriers for Anti-inflammatory Drugs and Pro-resolving Mediators
  • DOI:
    10.3389/fncel.2019.00530
  • 发表时间:
    2019-11-29
  • 期刊:
  • 影响因子:
    5.3
  • 作者:
    Kalinec, Gilda M.;Gao, Lucy;Kalinec, Federico
  • 通讯作者:
    Kalinec, Federico
Musical auditory processing, cognition, and psychopathology in 22q11.2 deletion syndrome

Gao, Lucy的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Gao, Lucy', 18)}}的其他基金

Learning subgroups from data: selective inference and applications
从数据中学习子群:选择性推理和应用
  • 批准号:
    DGECR-2021-00017
  • 财政年份:
    2021
  • 资助金额:
    $ 1.31万
  • 项目类别:
    Discovery Launch Supplement
Learning subgroups from data: selective inference and applications
从数据中学习子群:选择性推理和应用
  • 批准号:
    RGPIN-2021-02548
  • 财政年份:
    2021
  • 资助金额:
    $ 1.31万
  • 项目类别:
    Discovery Grants Program - Individual
Robust sparse partial least squares regression and classification
鲁棒稀疏偏最小二乘回归和分类
  • 批准号:
    487299-2016
  • 财政年份:
    2019
  • 资助金额:
    $ 1.31万
  • 项目类别:
    Postgraduate Scholarships - Doctoral
Robust sparse partial least squares regression and classification
鲁棒稀疏偏最小二乘回归和分类
  • 批准号:
    487299-2016
  • 财政年份:
    2018
  • 资助金额:
    $ 1.31万
  • 项目类别:
    Postgraduate Scholarships - Doctoral
Robust sparse partial least squares regression and classification
鲁棒稀疏偏最小二乘回归和分类
  • 批准号:
    487299-2016
  • 财政年份:
    2017
  • 资助金额:
    $ 1.31万
  • 项目类别:
    Postgraduate Scholarships - Doctoral
Robust sparse partial least squares regression and classification
鲁棒稀疏偏最小二乘回归和分类
  • 批准号:
    487299-2016
  • 财政年份:
    2016
  • 资助金额:
    $ 1.31万
  • 项目类别:
    Postgraduate Scholarships - Doctoral
Convex optimization for optimal design problems
最优设计问题的凸优化
  • 批准号:
    466564-2014
  • 财政年份:
    2014
  • 资助金额:
    $ 1.31万
  • 项目类别:
    University Undergraduate Student Research Awards
Optimal design for linear models with non-normal error distributions
具有非正态误差分布的线性模型的优化设计
  • 批准号:
    448965-2013
  • 财政年份:
    2013
  • 资助金额:
    $ 1.31万
  • 项目类别:
    University Undergraduate Student Research Awards

相似海外基金

Identifying patient subgroups and processes of care that cause outcome differences following ICU vs. ward triage among patients with acute respiratory failure and sepsis
确定急性呼吸衰竭和脓毒症患者在 ICU 与病房分诊后导致结局差异的患者亚组和护理流程
  • 批准号:
    10734357
  • 财政年份:
    2023
  • 资助金额:
    $ 1.31万
  • 项目类别:
Dissecting functional subgroups and closed-loop circuits between the pedunculopontine nucleus and the basal ganglia
解剖桥脚核和基底神经节之间的功能亚组和闭环回路
  • 批准号:
    10677467
  • 财政年份:
    2023
  • 资助金额:
    $ 1.31万
  • 项目类别:
Mapping the Cerebellar Origins of Medulloblastoma Subgroups
绘制髓母细胞瘤亚群的小脑起源图
  • 批准号:
    10587809
  • 财政年份:
    2023
  • 资助金额:
    $ 1.31万
  • 项目类别:
The impact of clinical interventions for sepsis in routine care and among detailed patient subgroups: A novel approach for causal effect estimation in electronic health record data
脓毒症临床干预措施对常规护理和详细患者亚组的影响:电子健康记录数据因果效应估计的新方法
  • 批准号:
    10686093
  • 财政年份:
    2022
  • 资助金额:
    $ 1.31万
  • 项目类别:
Examining the Integrative Effects of Adolescent, Parent, Provider, and Practice Level Factors on Adolescents' HPV Vaccine Uptake across Six Asian American Subgroups
检查青少年、家长、提供者和实践水平因素对六个亚裔美国人亚群体青少年 HPV 疫苗接种的综合影响
  • 批准号:
    10371334
  • 财政年份:
    2022
  • 资助金额:
    $ 1.31万
  • 项目类别:
Identifying and predicting subgroups related to function in individuals after stroke.
识别和预测与中风后个体功能相关的亚组。
  • 批准号:
    10459813
  • 财政年份:
    2022
  • 资助金额:
    $ 1.31万
  • 项目类别:
Recovery from Opioid Use Disorder: subgroups, transition states, and their association with recovery outcomes
阿片类药物使用障碍的恢复:亚组、过渡状态及其与恢复结果的关联
  • 批准号:
    10585674
  • 财政年份:
    2022
  • 资助金额:
    $ 1.31万
  • 项目类别:
Examining the Integrative Effects of Adolescent, Parent, Provider, and Practice Level Factors on Adolescents' HPV Vaccine Uptake across Six Asian American Subgroups
检查青少年、家长、提供者和实践水平因素对六个亚裔美国人亚群体青少年 HPV 疫苗接种的综合影响
  • 批准号:
    10551328
  • 财政年份:
    2022
  • 资助金额:
    $ 1.31万
  • 项目类别:
The impact of clinical interventions for sepsis in routine care and among detailed patient subgroups: A novel approach for causal effect estimation in electronic health record data
脓毒症临床干预措施对常规护理和详细患者亚组的影响:电子健康记录数据因果效应估计的新方法
  • 批准号:
    10505906
  • 财政年份:
    2022
  • 资助金额:
    $ 1.31万
  • 项目类别:
Cognitively Defined Alzheimer's Subgroups: Natural history, neuropathology, and life course ramifications
认知定义的阿尔茨海默病亚组:自然史、神经病理学和生命历程的影响
  • 批准号:
    10672371
  • 财政年份:
    2021
  • 资助金额:
    $ 1.31万
  • 项目类别:
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了