A More Perfect Union: Leveraging Clinically Deployed Models and Cancer Epidemiology Cohort Data to Improve AI/ML Readiness of NIH-Supported Population Sciences Resources

更完美的联盟:利用临床部署的模型和癌症流行病学队列数据来提高 NIH 支持的人口科学资源的 AI/ML 准备情况

基本信息

  • 批准号:
    10594304
  • 负责人:
  • 金额:
    $ 34.87万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
  • 财政年份:
    2015
  • 资助国家:
    美国
  • 起止时间:
    2015-09-01 至 2025-08-31
  • 项目状态:
    未结题

项目摘要

PROJECT SUMMARY/ABSTRACT NIH has invested hundreds of millions of dollars in large-scale prospective observational cohorts. These studies' diverse and valuable data have been used to generate important discoveries about how lifestyle and environment affect health and disease. These high-dimensional and multi-modal real-world data can enable broad research, including new AI/ML applications. Unfortunately, the standard methods cohorts use to store, manage, analyze, and share their data are not ideal for contemporary AI/ML use. This creates a “readiness gap” that hinders new AI/ML research. This project proposes an innovative yet feasible approach to close that gap by improving AI/ML readiness at multiple levels. Our multidisciplinary team includes AI/ML experts at City of Hope (COH); experienced population scientists from the California Teachers Study (CTS) cohort team; and cloud computing specialists from the San Diego Supercomputer Center's (SDSC) Sherlock Cloud. The CTS includes 133,477 female participants who have been followed continuously since 1995. Through surveys and linkages, the CTS has collected comprehensive exposure and lifestyle data and has identified over 28,000 cancers; over 34,000 deaths; and over 800,000 individual hospitalizations. Based on an AI/ML readiness framework, we will update the CTS's data & computing architecture; reconfigure data exploration and aggregation tools and documentation; and use CTS data to text, evaluate, and expand existing, clinically deployed AI/ML models. First, we will expand the current private CTS data analytics cloud to include a new scalable computing environment specifically for AI/ML. We will deploy Amazon Web Services (AWS) resources for AI/ML within our secure CTS enclave and provision GPU-enabled instances running a full suite of scientific computing and AI/ML packages in Python and Jupyter Notebooks. Second, we will generate embeddings in the CTS data to reduce the data complexity that is a barrier to AI/ML applications. Embeddings are low- dimensional latent representations that compress data from multiple modalities into vectors that represent a compact embedding, or abstracted summary, of a participant's data. Use of unsupervised learning and an autocoder deep neural network will cluster CTS data into phenotype-based subgroups that can be used for essential AI/ML functions, such as cohort discovery, close-neighbor identification, and imputation. Third, we will augment clinically deployed risk models at COH (e.g., for readmissions) with CTS data to directly evaluate the potential for real-world cohort data to improve model performance and the portability of clinical models into cohort populations. Each of these three initiatives will be documented in interactive tutorial notebooks that will be FAIR for the research community. This project includes a balanced combination of people, process, and technology: a new multidisciplinary team of experts from relevant fields; new general-purpose embedding representations of observational cohort data; and a secure cloud-based infrastructure configured specifically for new AI/ML projects. Successful completion of this work will close the AI/ML readiness gap for cohort data.
项目总结/摘要 NIH已经在大规模前瞻性观察队列中投入了数亿美元。这些 研究的多样化和有价值的数据被用来产生关于生活方式和 环境影响健康和疾病。这些高维和多模态的真实世界数据可以使 广泛的研究,包括新的AI/ML应用。不幸的是,标准的方法用于存储, 管理、分析和共享他们的数据对于当代AI/ML的使用并不理想。这就形成了一种"准备就绪 这一差距阻碍了新的AI/ML研究。该项目提出了一种创新但可行的方法, 通过在多个层面上提高AI/ML准备程度来弥补这一差距。我们的多学科团队包括City的AI/ML专家 来自加州教师研究(CTS)队列团队的经验丰富的人口科学家; 来自圣地亚哥超级计算机中心(SDSC)的云计算专家Sherlock Cloud。的cts 包括133,477名女性参与者,自1995年以来一直持续跟踪。通过调查和 联系,CTS收集了全面的暴露和生活方式数据,并确定了28,000多个 癌症;超过34,000例死亡;超过800,000例住院治疗。基于AI/ML就绪性 框架,我们将更新CTS的数据和计算架构;重新配置数据探索, 聚合工具和文档;并使用CTS数据来文本化、评估和扩展现有的临床 AI/ML模型。首先,我们将扩展当前的私有CTS数据分析云, 可扩展的计算环境,专门用于AI/ML。我们将部署Amazon Web Services(AWS)资源 在我们的安全CTS飞地内为AI/ML提供支持,并提供支持GPU的实例, Python和Python笔记本中的计算和AI/ML包。其次,我们将在 CTS数据,以降低数据复杂性,这是AI/ML应用程序的障碍。嵌入性很低- 多维潜在表示,将来自多个模态的数据压缩成表示 参与者数据的紧凑嵌入或抽象摘要。使用无监督学习和 自动编码器深度神经网络将CTS数据聚类到基于表型的子组中, 基本的AI/ML功能,如队列发现、近邻识别和插补。三是 在COH增加临床部署的风险模型(例如,对于再入院)与CTS数据直接评估 真实世界队列数据的潜力,以提高模型性能和临床模型的可移植性, 队列人群。这三项举措中的每一项都将记录在交互式教程笔记本中, 对研究界公平。该项目包括人员、流程和 技术:一个由相关领域专家组成的新的多学科小组;新的通用嵌入 观察性队列数据的表示;以及专门配置的安全的基于云的基础设施 新的AI/ML项目这项工作的成功完成将缩小队列数据的AI/ML准备差距。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

James V Lacey其他文献

Differences in polygenic score distributions in European ancestry populations: implications for breast cancer risk prediction
欧洲血统人群多基因评分分布的差异:对乳腺癌风险预测的影响
  • DOI:
    10.1101/2024.02.12.24302043
  • 发表时间:
    2024
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Kristia Yiangou;N. Mavaddat;J. Dennis;M. Zanti;Qin Wang;M. Bolla;Mustapha Abubakar;T. Ahearn;I. Andrulis;H. Anton;N. Antonenkova;V. Arndt;K. Aronson;A. Augustinsson;Adinda Baten;S. Behrens;M. Bermisheva;Amy Berrington de González;K. Białkowska;N. Boddicker;C. Bodelón;N. Bogdanova;S. Bojesen;K. Brantley;H. Brauch;H. Brenner;Nicola J. Camp;F. Canzian;J. Castelao;Melissa H. Cessna;J. Chang;G. Chenevix;Wendy K. Chung;Sarah V Colonna;F. Couch;A. Cox;Simon S. Cross;K. Czene;M. Daly;P. Devilee;T. Dörk;A. Dunning;Diana M. Eccles;A. Eliassen;Christoph Engel;M. Eriksson;D. G. Evans;Peter A. Fasching;O. Fletcher;H. Flyger;L. Fritschi;M. Gago;A. Gentry;A. González;P. Guénel;E. Hahnen;C. Haiman;U. Hamann;Jaana M. Hartikainen;Vikki Ho;James M. Hodge;A. Hollestelle;E. Honisch;M. Hooning;Reiner Hoppe;J. Hopper;Sacha J. Howell;A. Howell;Simona Jakovchevska;A. Jakubowska;H. Jernström;N. Johnson;Rudolf Kaaks;Elza K. Khusnutdinova;C. Kitahara;Stella Koutros;V. Kristensen;James V Lacey;D. Lambrechts;F. Lejbkowicz;A. Lindblom;M. Lush;A. Mannermaa;Dimitrios Mavroudis;Usha Menon;Rachel Murphy;H. Nevanlinna;Nadia Obi;K. Offit;Tjoung;A. Patel;Cheng Peng;P. Peterlongo;G. Pita;D. Plaseska‐Karanfilska;K. Pylkäs;P. Radice;M. U. Rashid;Gadi Rennert;Eleanor Roberts;Juan Rodriguez;A. Romero;E. Rosenberg;E. Saloustros;Dale P. Sandler;E. Sawyer;Rita Schmutzler;C. Scott;X. Shu;M. Southey;Jennifer Stone;Jack A. Taylor;Lauren R. Teras;I. van de Beek;Walter Willett;R. Winqvist;W. Zheng;C. Vachon;M. Schmidt;Per Hall;R. MacInnis;R. Milne;Paul D. P. Pharoah;J. Simard;A. Antoniou;Douglas F. Easton;K. Michailidou
  • 通讯作者:
    K. Michailidou
Association of tea and coffee consumption and biliary tract cancer risk: The Biliary Tract Cancers Pooling Project
茶和咖啡消费与胆道癌风险的关联:胆道癌汇集项目
  • DOI:
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    13.5
  • 作者:
    Yu;E. Loftfield;Ilona Argirion;Hans;D. Albanes;A. T. Chan;V. Fedirko;G.E. Fraser;Neal D. Freedman;Graham G. Giles;Patricia Hartge;Verena A. Katzke;S. Knutsen;James V Lacey;Linda M. Liao;Juhua Luo;R. Milne;K. O’Brien;Ulrike Peters;J. Poynter;M. Purdue;K. Robien;Sven Sandin;Dale P. Sandler;V. Setiawan;Jae H. Kang;Tracey G. Simon;Rashmi Sinha;T. VoPham;S. Weinstein;Emily White;Xuehong Zhang;Bin Zhu;K. McGlynn;P. Campbell;Mei;J. Koshiol
  • 通讯作者:
    J. Koshiol
A likelihood ratio approach for utilizing case-control data in the clinical classification of rare sequence variants: application to BRCA1 and BRCA2.
在罕见序列变异的临床分类中利用病例对照数据的似然比方法:应用于 BRCA1 和 BRCA2。
  • DOI:
    10.1155/2023/9961341
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    3.9
  • 作者:
    M. Zanti;Denise G. O’Mahony;Michael T. Parsons;Hongyan Li;J. Dennis;Kristiina Aittomäkkiki;I. Andrulis;H. Anton;K. Aronson;A. Augustinsson;Heiko Becher;S. Bojesen;M. Bolla;H. Brenner;Melissa A. Brown;S. Buys;F. Canzian;S. Caputo;J. Castelao;J. Chang;GC;K. Czene;M. Daly;Arcangela De Nicolo;P. Devilee;T. Dörk;A. Dunning;M. Dwek;D. Eccles;C. Engel;David Evans;P. Fasching;M. Gago;M. García;J. García;A. Gentry;Willemina R. R. Geurts ;G. Giles;G. Glendon;M. Goldberg;E. G. Gómez Garcia;Melanie Güendert;P. Guénel;E. Hahnen;C. Haiman;P. Hall;U. Hamann;E. Harkness;Frans B. L. Hogervorst;A. Hollestelle;Reiner Hoppe;J. Hopper;C. Houdayer;R. Houlston;A. Howell;ABCTB Investigators;M. Jakimovska;A. Jakubowska;H. Jernström;E. John;R. Kaaks;C. Kitahara;Stella Koutros;P. Kraft;V. Kristensen;James V Lacey;D. Lambrechts;M. Léoné;A. Lindblom;J. Lubiński;M. Lush;A. Mannermaa;M. Manoochehri;S. Manoukian;S. Margolin;M. E. Martinez;Usha Menon;R. Milne;A. Monteiro;R. A. Murphy;S. Neuhausen;H. Nevanlinna;W. G. Newman;K. Offit;Sue;P. James;P. Peterlongo;J. Peto;D. Plaseska‐Karanfilska;K. Punie;P. Radice;Muhammad Abo ul Hassan Rashid;G. Rennert;A. Romero;E. Rosenberg;E. Saloustros;D. Sandler;M. Schmidt;R. Schmutzler;X. Shu;J. Simard;M. Southey;J. Stone;D. Stoppa;R. Tamimi;W. Tapper;Jack A. Taylor;Soo;Lauren R. Teras;M. Terry;M. Thomassen;M. Troester;C. Vachon;Ana Vega;M. Vreeswijk;Qin Wang;B. Wappenschmidt;C. R. Weinberg;A. Wolk;W. Zheng;Bing;F. Couch;A. Spurdle;D. Easton;D. Goldgar;K. Michailidou
  • 通讯作者:
    K. Michailidou

James V Lacey的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('James V Lacey', 18)}}的其他基金

Genome-wide genotyping of existing samples from Asian American and Pacific Islander participants in the California Teachers Study cohort to facilitate broad and open future research
对加州教师研究队列中亚裔美国人和太平洋岛民参与者的现有样本进行全基因组基因分型,以促进广泛和开放的未来研究
  • 批准号:
    10408476
  • 财政年份:
    2015
  • 资助金额:
    $ 34.87万
  • 项目类别:
Innovative Infrastructure to Enhance and Sustain the California Teachers Study Cohort
创新基础设施,以增强和维持加州教师研究队伍
  • 批准号:
    10686141
  • 财政年份:
    2015
  • 资助金额:
    $ 34.87万
  • 项目类别:
Innovative Infrastructure to Enhance and Sustain the California Teachers Study Cohort
创新基础设施,以增强和维持加州教师研究队伍
  • 批准号:
    10259758
  • 财政年份:
    2015
  • 资助金额:
    $ 34.87万
  • 项目类别:
Genome-wide genotyping of existing samples in the California Teachers Study cohort from under-represented racial & ethnic minority groups to facilitate broad and open future research
对加州教师研究队列中代表性不足的种族的现有样本进行全基因组基因分型
  • 批准号:
    10629670
  • 财政年份:
    2015
  • 资助金额:
    $ 34.87万
  • 项目类别:
Innovative Infrastructure to Enhance and Sustain the California Teachers Study Cohort
创新基础设施,以增强和维持加州教师研究队伍
  • 批准号:
    10053181
  • 财政年份:
    2015
  • 资助金额:
    $ 34.87万
  • 项目类别:
Innovative Infrastructure to Enhance and Sustain the California Teachers Study Cohort
创新基础设施,以增强和维持加州教师研究队伍
  • 批准号:
    10478112
  • 财政年份:
    2015
  • 资助金额:
    $ 34.87万
  • 项目类别:
Oil and Gas as Drivers of Climate Change and Health: Developing unique resources to investigate multi-level and diverse effects of exposure to oil and gas wells
石油和天然气作为气候变化和健康的驱动因素:开发独特的资源来调查接触石油和天然气井的多层次和多样化的影响
  • 批准号:
    10839157
  • 财政年份:
    2015
  • 资助金额:
    $ 34.87万
  • 项目类别:
New Biospecimens to Enhance Research in the California Teachers Study Cohort
新的生物样本将加强加州教师研究队列的研究
  • 批准号:
    8550025
  • 财政年份:
    2012
  • 资助金额:
    $ 34.87万
  • 项目类别:
New Biospecimens to Enhance Research in the California Teachers Study Cohort
新的生物样本将加强加州教师研究队列的研究
  • 批准号:
    8374426
  • 财政年份:
    2012
  • 资助金额:
    $ 34.87万
  • 项目类别:
New Biospecimens to Enhance Research in the California Teachers Study Cohort
新的生物样本将加强加州教师研究队列的研究
  • 批准号:
    8917892
  • 财政年份:
    2012
  • 资助金额:
    $ 34.87万
  • 项目类别:

相似海外基金

Practical Study on Disaster Countermeasure Architecture Model by Sustainable Design in Asian Flood Area
亚洲洪泛区可持续设计防灾建筑模型实践研究
  • 批准号:
    17K00727
  • 财政年份:
    2017
  • 资助金额:
    $ 34.87万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Functional architecture of a face processing area in the common marmoset
普通狨猴面部处理区域的功能架构
  • 批准号:
    9764503
  • 财政年份:
    2016
  • 资助金额:
    $ 34.87万
  • 项目类别:
Heating and airconditioning by hypocausts in residential and representative architecture in Rome and Latium studies of a phenomenon of luxury in a favoured climatic area of the Roman Empire on the basis of selected examples.
罗马和拉齐奥的住宅和代表性建筑中的火烧供暖和空调根据选定的例子,研究了罗马帝国有利的气候地区的奢华现象。
  • 批准号:
    317469425
  • 财政年份:
    2016
  • 资助金额:
    $ 34.87万
  • 项目类别:
    Research Grants
SBIR Phase II: Area and Energy Efficient Error Floor Free Low-Density Parity-Check Codes Decoder Architecture for Flash Based Storage
SBIR 第二阶段:用于基于闪存的存储的面积和能源效率高、无错误层的低密度奇偶校验码解码器架构
  • 批准号:
    1632562
  • 财政年份:
    2016
  • 资助金额:
    $ 34.87万
  • 项目类别:
    Standard Grant
SBIR Phase I: Area and Energy Efficient Error Floor Free Low-Density Parity-Check Codes Decoder Architecture for Flash Based Storage
SBIR 第一阶段:用于基于闪存的存储的面积和能源效率高、无错误层低密度奇偶校验码解码器架构
  • 批准号:
    1520137
  • 财政年份:
    2015
  • 资助金额:
    $ 34.87万
  • 项目类别:
    Standard Grant
A Study on The Spatial Setting and The Inhavitant's of The Flood Prevention Architecture in The Flood Area
洪泛区防洪建筑空间设置及居民生活研究
  • 批准号:
    26420620
  • 财政年份:
    2014
  • 资助金额:
    $ 34.87万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Area and power efficient interconnect architecture for multi-bit processing on FPGAs
用于 FPGA 上多位处理的面积和功率高效互连架构
  • 批准号:
    327691-2007
  • 财政年份:
    2011
  • 资助金额:
    $ 34.87万
  • 项目类别:
    Discovery Grants Program - Individual
A FUNDAMENTAL STUDY ON UTILIZATION OF THE POST-WAR ARCHITECTURE AS URBAN REGENERATION METHOD, A case of the central area of Osaka city
战后建筑作为城市更新方法的基础研究——以大阪市中心区为例
  • 批准号:
    22760469
  • 财政年份:
    2010
  • 资助金额:
    $ 34.87万
  • 项目类别:
    Grant-in-Aid for Young Scientists (B)
Area and power efficient interconnect architecture for multi-bit processing on FPGAs
用于 FPGA 上多位处理的面积和功率高效互连架构
  • 批准号:
    327691-2007
  • 财政年份:
    2010
  • 资助金额:
    $ 34.87万
  • 项目类别:
    Discovery Grants Program - Individual
Area and power efficient interconnect architecture for multi-bit processing on FPGAs
用于 FPGA 上多位处理的面积和功率高效互连架构
  • 批准号:
    327691-2007
  • 财政年份:
    2009
  • 资助金额:
    $ 34.87万
  • 项目类别:
    Discovery Grants Program - Individual
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了