权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

ENRICHing NIH Imaging Datasets to Prepare them for Machine Learning

丰富 NIH 成像数据集，为机器学习做好准备

基本信息

批准号：
10842910
负责人：
Rima Arnaout
金额：
$ 35.09万
依托单位：
UNIVERSITY OF CALIFORNIA, SAN FRANCISCO
依托单位国家：
美国
项目类别：
财政年份：
2020
资助国家：
美国
起止时间：
2020-04-01 至 2025-03-31
项目状态：
未结题

来源：
https://reporter.nih.gov/project-details/10842910
关键词：
Achievement Algorithms Artificial Intelligence Benchmarking COVID-19 Cardiology Collaborations Collection Communities Data Data Science Data Set Data Storage and Retrieval Descriptor Detection Disease Ensure Environment Fetal Diseases Gender Goals Human Image Image Analysis Informatics Information Theory Intelligence Investments Label Learning Left Link Machine Learning Mathematics Measures Medical Imaging Modality Modeling Morphologic artifacts Paper Parents Pathology Patients Pattern Performance Physicians Process Property Proxy Publishing Race Research Research Methodology Research Personnel Roentgen Rays Running Scientist Techniques Testing Thoracic Radiography Time Training Ultrasonography United States National Institutes of Health Validation Work biomedical imaging clinical center computer program congenital heart disorder cost cost effective data harmonization data structure deep learning deep learning model disease classification diverse data frontier health care settings improved innovation insight large datasets multidisciplinary repository tool trustworthiness

项目摘要

PROJECT SUMMARY Objective: The goal of the parent proposal is to develop and optimize deep learning (DL) to improve detection of congenital heart disease (CHD) from fetal ultrasound imaging. This work includes evaluation of an imaging collection spanning two decades, tens of thousands of patients, and several clinical centers across a range of healthcare settings. Background: Through this work, we have found that performance of DL models is critically linked to the quality of the datasets used to train and test them. However, the AI/ML field lacks a complete understanding of how to measure “quality.” To date, image datasets are either described subjectively or measured crudely by size, i.e. the number of images they contain. However, “more is better” fails to account for the key importance of diversity in the quality of image datasets. In parent Aim 1, we sought to develop better metrics for dataset quality and content, founded in information theory and leveraging diversity. This work has already proven quite useful for our parent use case, but it is also extremely important for all imaging datasets in order to save on data storage/transfer costs, harmonize data intelligently, save on laborious image labeling, screen for artifacts both anticipated and un-anticipated, and ensure diversity at several levels. Preliminary Studies: Our multi-disciplinary team in imaging, DL, and information theory has successfully developed a framework to analyze image datasets, called ENRICH. ENRICH consists of two main steps. First, a similarity metric is calculated for all pairs of images in a given dataset, forming a matrix of pairwise-similarity values. Second, an instance-selection algorithm operates on the matrix to describe its diversity and/or curate the most informative images. ENRICH is customizable in that different choices for pairwise image similarity metric and for curation algorithm can be used for different tasks. An initial implementation of ENRICH aimed at reducing redundancy allowed us to get the same DL model performance in a CHD classification task from only a fraction of the original training data. It also identified data structure and imaging artifacts without a priori labeling, among other achievements (see Research Strategy). Goals of Supplement: The next logical step is to apply ENRICH to more biomedical datasets, both to further validate its utility and to provide quantitative descriptors of quality on datasets important for the research community. Aims: (1) We will run ENRICH on several NIH imaging datasets, including (2) validating labels and adding annotations to targeted subsets of these datasets. (3) We will document and publish these methods for the research community to use, including connecting with the original NIH repository for each dataset. Environment and Impact: This work proposed is supported in an outstanding environment at the crossroads of data science, imaging, and information theory and will provide valuable tools and insight into how best to measure image dataset content and quality in order to rigorously train and test DL for biomedical tasks.

项目摘要目的：母提案的目标是开发和优化深度学习（DL）以提高检测能力先天性心脏病（CHD）的胎儿超声成像。这项工作包括评价成像收集跨越二十年，数万名患者，和几个临床中心在一系列医疗保健设置。背景：通过这项工作，我们发现DL模型的性能这与用于训练和测试它们的数据集的质量密切相关。然而，AI/ML领域缺乏全面了解如何衡量“质量”。到目前为止，图像数据集要么是主观描述的，或者粗略地通过大小来测量，即它们包含的图像的数量。然而，“越多越好”并没有说明多样性在图像数据集质量中的关键重要性。在目标1中，我们试图开发数据集质量和内容的更好指标，建立在信息理论和利用多样性的基础上。这项工作已经证明对于我们的父用例非常有用，但对于所有成像也非常重要数据集，以节省数据存储/传输成本，智能地协调数据，节省费力的图像标签，筛选预期和非预期的伪影，并确保多个级别的多样性。初步研究：我们在成像，DL和信息理论方面的多学科团队已经成功地开发了一个框架来分析图像数据集，称为ENRICH。ENRICH包括两个主要步骤。第一、为给定数据集中的所有图像对计算相似性度量，形成成对相似性矩阵价值观其次，实例选择算法对矩阵进行操作以描述其多样性和/或策展性。最具信息性的图像。ENRICH是可定制的，因为成对图像相似性的不同选择度量和策展算法可用于不同的任务。ENRICH的初步实施旨在减少冗余使我们能够在CHD分类任务中仅从原始训练数据的一小部分。它还可以在没有先验的情况下识别数据结构和成像伪影标签，以及其他成就（见研究策略）。补充的目标：下一个合乎逻辑的步骤是将ENRICH应用于更多的生物医学数据集，以进一步验证其实用性，并提供定量的数据集质量描述符对研究界很重要。目标：（1）我们将在几个NIH成像数据集，包括（2）验证标签并向目标子集添加注释，这些数据集。(3)我们将记录并公布这些方法供研究界使用，包括与每个数据集的原始NIH存储库连接。环境和影响：拟议的这项工作是在数据科学、成像和信息理论交叉点的出色环境中提供支持并将提供有价值的工具和洞察力，以了解如何最好地衡量图像数据集的内容和质量，严格培训和测试DL的生物医学任务。

项目成果

期刊论文数量（12）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Visualizing omicron: COVID-19 deaths vs. cases over time.

DOI：
10.1371/journal.pone.0265233
发表时间：
2022
期刊：
PloS one
影响因子：
3.7
作者：
通讯作者：

The (Heart and) Soul of a Human Creation: Designing Echocardiography for the Big Data Age.

人类创造的（心和）灵魂：为大数据时代设计超声心动图。

DOI：
10.1016/j.echo.2023.04.016
发表时间：
2023
期刊：
Journal of the American Society of Echocardiography : official publication of the American Society of Echocardiography
影响因子：
0
作者：
Arnaout,Rima;Hahn,RebeccaT;Hung,JudyW;Jone,Pei-Ni;Lester,StevenJ;Little,StephenH;Mackensen,GBurkhard;Rigolin,Vera;Sachdev,Vandana;Saric,Muhamed;Sengupta,ParthoP;Strom,JordanB;Taub,CynthiaC;Thamman,Ritu;Abraham,Theodore
通讯作者：
Abraham,Theodore

Mitral Valve Atlas for Artificial Intelligence Predictions of MitraClip Intervention Outcomes.

DOI：
10.3389/fcvm.2021.759675
发表时间：
2021
期刊：
Frontiers in cardiovascular medicine
影响因子：
3.6
作者：
Dabiri Y;Yao J;Mahadevan VS;Gruber D;Arnaout R;Gentzsch W;Guccione JM;Kassab GS
通讯作者：
Kassab GS

Myocardial Texture Analysis of Echocardiograms in Cardiac Transthyretin Amyloidosis.

心脏运甲状腺素蛋白淀粉样变性超声心动图的心肌纹理分析。

DOI：
10.1016/j.echo.2024.02.005
发表时间：
2024
期刊：
Journal of the American Society of Echocardiography : official publication of the American Society of Echocardiography
影响因子：
0
作者：
Datar,Yesh;Cuddy,SarahAM;Ovsak,Gavin;Giblin,GerardT;Maurer,MathewS;Ruberg,FrederickL;Arnaout,Rima;Dorbala,Sharmila
通讯作者：
Dorbala,Sharmila

An ensemble of neural networks provides expert-level prenatal detection of complex congenital heart disease.