ENRICHing NIH Imaging Datasets to Prepare them for Machine Learning
丰富 NIH 成像数据集,为机器学习做好准备
基本信息
- 批准号:10842910
- 负责人:
- 金额:$ 35.09万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2020
- 资助国家:美国
- 起止时间:2020-04-01 至 2025-03-31
- 项目状态:未结题
- 来源:
- 关键词:AchievementAlgorithmsArtificial IntelligenceBenchmarkingCOVID-19CardiologyCollaborationsCollectionCommunitiesDataData ScienceData SetData Storage and RetrievalDescriptorDetectionDiseaseEnsureEnvironmentFetal DiseasesGenderGoalsHumanImageImage AnalysisInformaticsInformation TheoryIntelligenceInvestmentsLabelLearningLeftLinkMachine LearningMathematicsMeasuresMedical ImagingModalityModelingMorphologic artifactsPaperParentsPathologyPatientsPatternPerformancePhysiciansProcessPropertyProxyPublishingRaceResearchResearch MethodologyResearch PersonnelRoentgen RaysRunningScientistTechniquesTestingThoracic RadiographyTimeTrainingUltrasonographyUnited States National Institutes of HealthValidationWorkbiomedical imagingclinical centercomputer programcongenital heart disordercostcost effectivedata harmonizationdata structuredeep learningdeep learning modeldisease classificationdiverse datafrontierhealth care settingsimprovedinnovationinsightlarge datasetsmultidisciplinaryrepositorytooltrustworthiness
项目摘要
PROJECT SUMMARY
Objective: The goal of the parent proposal is to develop and optimize deep learning (DL) to improve detection
of congenital heart disease (CHD) from fetal ultrasound imaging. This work includes evaluation of an imaging
collection spanning two decades, tens of thousands of patients, and several clinical centers across a range of
healthcare settings. Background: Through this work, we have found that performance of DL models is
critically linked to the quality of the datasets used to train and test them. However, the AI/ML field lacks a
complete understanding of how to measure “quality.” To date, image datasets are either described subjectively
or measured crudely by size, i.e. the number of images they contain. However, “more is better” fails to account
for the key importance of diversity in the quality of image datasets. In parent Aim 1, we sought to develop
better metrics for dataset quality and content, founded in information theory and leveraging diversity. This work
has already proven quite useful for our parent use case, but it is also extremely important for all imaging
datasets in order to save on data storage/transfer costs, harmonize data intelligently, save on laborious image
labeling, screen for artifacts both anticipated and un-anticipated, and ensure diversity at several levels.
Preliminary Studies: Our multi-disciplinary team in imaging, DL, and information theory has successfully
developed a framework to analyze image datasets, called ENRICH. ENRICH consists of two main steps. First,
a similarity metric is calculated for all pairs of images in a given dataset, forming a matrix of pairwise-similarity
values. Second, an instance-selection algorithm operates on the matrix to describe its diversity and/or curate
the most informative images. ENRICH is customizable in that different choices for pairwise image similarity
metric and for curation algorithm can be used for different tasks. An initial implementation of ENRICH aimed at
reducing redundancy allowed us to get the same DL model performance in a CHD classification task from only
a fraction of the original training data. It also identified data structure and imaging artifacts without a priori
labeling, among other achievements (see Research Strategy). Goals of Supplement: The next logical step is
to apply ENRICH to more biomedical datasets, both to further validate its utility and to provide quantitative
descriptors of quality on datasets important for the research community. Aims: (1) We will run ENRICH on
several NIH imaging datasets, including (2) validating labels and adding annotations to targeted subsets of
these datasets. (3) We will document and publish these methods for the research community to use, including
connecting with the original NIH repository for each dataset. Environment and Impact: This work proposed is
supported in an outstanding environment at the crossroads of data science, imaging, and information theory
and will provide valuable tools and insight into how best to measure image dataset content and quality in order
to rigorously train and test DL for biomedical tasks.
项目摘要
目的:母提案的目标是开发和优化深度学习(DL)以提高检测能力
先天性心脏病(CHD)的胎儿超声成像。这项工作包括评价成像
收集跨越二十年,数万名患者,和几个临床中心在一系列
医疗保健设置。背景:通过这项工作,我们发现DL模型的性能
这与用于训练和测试它们的数据集的质量密切相关。然而,AI/ML领域缺乏
全面了解如何衡量“质量”。到目前为止,图像数据集要么是主观描述的,
或者粗略地通过大小来测量,即它们包含的图像的数量。然而,“越多越好”并没有说明
多样性在图像数据集质量中的关键重要性。在目标1中,我们试图开发
数据集质量和内容的更好指标,建立在信息理论和利用多样性的基础上。这项工作
已经证明对于我们的父用例非常有用,但对于所有成像也非常重要
数据集,以节省数据存储/传输成本,智能地协调数据,节省费力的图像
标签,筛选预期和非预期的伪影,并确保多个级别的多样性。
初步研究:我们在成像,DL和信息理论方面的多学科团队已经成功地
开发了一个框架来分析图像数据集,称为ENRICH。ENRICH包括两个主要步骤。第一、
为给定数据集中的所有图像对计算相似性度量,形成成对相似性矩阵
价值观其次,实例选择算法对矩阵进行操作以描述其多样性和/或策展性。
最具信息性的图像。ENRICH是可定制的,因为成对图像相似性的不同选择
度量和策展算法可用于不同的任务。ENRICH的初步实施旨在
减少冗余使我们能够在CHD分类任务中仅从
原始训练数据的一小部分。它还可以在没有先验的情况下识别数据结构和成像伪影
标签,以及其他成就(见研究策略)。补充的目标:下一个合乎逻辑的步骤是
将ENRICH应用于更多的生物医学数据集,以进一步验证其实用性,并提供定量的
数据集质量描述符对研究界很重要。目标:(1)我们将在
几个NIH成像数据集,包括(2)验证标签并向目标子集添加注释,
这些数据集。(3)我们将记录并公布这些方法供研究界使用,包括
与每个数据集的原始NIH存储库连接。环境和影响:拟议的这项工作是
在数据科学、成像和信息理论交叉点的出色环境中提供支持
并将提供有价值的工具和洞察力,以了解如何最好地衡量图像数据集的内容和质量,
严格培训和测试DL的生物医学任务。
项目成果
期刊论文数量(12)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Visualizing omicron: COVID-19 deaths vs. cases over time.
- DOI:10.1371/journal.pone.0265233
- 发表时间:2022
- 期刊:
- 影响因子:3.7
- 作者:
- 通讯作者:
The (Heart and) Soul of a Human Creation: Designing Echocardiography for the Big Data Age.
人类创造的(心和)灵魂:为大数据时代设计超声心动图。
- DOI:10.1016/j.echo.2023.04.016
- 发表时间:2023
- 期刊:
- 影响因子:0
- 作者:Arnaout,Rima;Hahn,RebeccaT;Hung,JudyW;Jone,Pei-Ni;Lester,StevenJ;Little,StephenH;Mackensen,GBurkhard;Rigolin,Vera;Sachdev,Vandana;Saric,Muhamed;Sengupta,ParthoP;Strom,JordanB;Taub,CynthiaC;Thamman,Ritu;Abraham,Theodore
- 通讯作者:Abraham,Theodore
Mitral Valve Atlas for Artificial Intelligence Predictions of MitraClip Intervention Outcomes.
- DOI:10.3389/fcvm.2021.759675
- 发表时间:2021
- 期刊:
- 影响因子:3.6
- 作者:Dabiri Y;Yao J;Mahadevan VS;Gruber D;Arnaout R;Gentzsch W;Guccione JM;Kassab GS
- 通讯作者:Kassab GS
Myocardial Texture Analysis of Echocardiograms in Cardiac Transthyretin Amyloidosis.
心脏运甲状腺素蛋白淀粉样变性超声心动图的心肌纹理分析。
- DOI:10.1016/j.echo.2024.02.005
- 发表时间:2024
- 期刊:
- 影响因子:0
- 作者:Datar,Yesh;Cuddy,SarahAM;Ovsak,Gavin;Giblin,GerardT;Maurer,MathewS;Ruberg,FrederickL;Arnaout,Rima;Dorbala,Sharmila
- 通讯作者:Dorbala,Sharmila
An ensemble of neural networks provides expert-level prenatal detection of complex congenital heart disease.
- DOI:10.1038/s41591-021-01342-5
- 发表时间:2021-05
- 期刊:
- 影响因子:82.9
- 作者:Arnaout R;Curran L;Zhao Y;Levine JC;Chinn E;Moon-Grady AJ
- 通讯作者:Moon-Grady AJ
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Rima Arnaout其他文献
Rima Arnaout的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Rima Arnaout', 18)}}的其他基金
Developing FAIR practices for cloud-enabled AI deployment for prospective testing
为基于云的人工智能部署制定公平实践以进行前瞻性测试
- 批准号:
10827803 - 财政年份:2023
- 资助金额:
$ 35.09万 - 项目类别:
Improving cardiovascular image-based phenotyping using emerging methods in artificial intelligence
使用人工智能新兴方法改善基于心血管图像的表型分析
- 批准号:
10379426 - 财政年份:2020
- 资助金额:
$ 35.09万 - 项目类别:
Improving cardiovascular image-based phenotyping using emerging methods in artificial intelligence
使用人工智能新兴方法改善基于心血管图像的表型分析
- 批准号:
10608075 - 财政年份:2020
- 资助金额:
$ 35.09万 - 项目类别:
Genetics and Structure of Trabecular Myocardium in Development and Disease
发育和疾病中小梁心肌的遗传学和结构
- 批准号:
9764455 - 财政年份:2015
- 资助金额:
$ 35.09万 - 项目类别:
Genetics and Structure of Trabecular Myocardium in Development and Disease
发育和疾病中小梁心肌的遗传学和结构
- 批准号:
8967119 - 财政年份:2015
- 资助金额:
$ 35.09万 - 项目类别:
Genetic Analyst of Early Conduction System Development
早期传导系统开发的遗传分析
- 批准号:
8202805 - 财政年份:2011
- 资助金额:
$ 35.09万 - 项目类别:
Genetic Analyst of Early Conduction System Development
早期传导系统开发的遗传分析
- 批准号:
8316460 - 财政年份:2011
- 资助金额:
$ 35.09万 - 项目类别:
相似海外基金
CAREER: CAS-Climate: Forecast-informed Flexible Reservoir System Modeling Enabled by Artificial Intelligence Algorithms Using Subseasonal-to-Seasonal Hydroclimatological Forecasts
职业:CAS-气候:利用次季节到季节水文气候预测的人工智能算法实现基于预测的灵活水库系统建模
- 批准号:
2236926 - 财政年份:2023
- 资助金额:
$ 35.09万 - 项目类别:
Continuing Grant
Artificial intelligence algorithms to predict risk of injury in racehorses.
预测赛马受伤风险的人工智能算法。
- 批准号:
LP210200798 - 财政年份:2023
- 资助金额:
$ 35.09万 - 项目类别:
Linkage Projects
Collaborative Research: SHF: Small: Artificial Intelligence of Things (AIoT): Theory, Architecture, and Algorithms
合作研究:SHF:小型:物联网人工智能 (AIoT):理论、架构和算法
- 批准号:
2221742 - 财政年份:2022
- 资助金额:
$ 35.09万 - 项目类别:
Standard Grant
Performance-Based Earthquake Engineering 2.0: Machine-Learning and Artificial Intelligence Algorithms for seismic hazard and vulnerability.
基于性能的地震工程 2.0:地震灾害和脆弱性的机器学习和人工智能算法。
- 批准号:
2765246 - 财政年份:2022
- 资助金额:
$ 35.09万 - 项目类别:
Studentship
The 'risk of risk': remodelling artificial intelligence algorithms for predicting child abuse.
“风险中的风险”:重塑人工智能算法以预测虐待儿童行为。
- 批准号:
ES/R00983X/2 - 财政年份:2022
- 资助金额:
$ 35.09万 - 项目类别:
Research Grant
Collaborative Research: SHF: Small: Artificial Intelligence of Things (AIoT): Theory, Architecture, and Algorithms
合作研究:SHF:小型:物联网人工智能 (AIoT):理论、架构和算法
- 批准号:
2221741 - 财政年份:2022
- 资助金额:
$ 35.09万 - 项目类别:
Standard Grant
Developing a platform for deep phenotyping of heart failure with preserved ejection fraction using raw, widely-available, multi-modality data and artificial intelligence algorithms
使用原始、广泛可用的多模态数据和人工智能算法,开发一个对射血分数保留的心力衰竭进行深度表型分析的平台
- 批准号:
10683803 - 财政年份:2022
- 资助金额:
$ 35.09万 - 项目类别:
Early-assymptomatic-dementia prediction based on a white-matter biomarker using Artificial Intelligence algorithms
使用人工智能算法基于白质生物标志物的早期无症状痴呆症预测
- 批准号:
460558 - 财政年份:2022
- 资助金额:
$ 35.09万 - 项目类别:
Concluding 50 Years of Research in Wireless Communications: Algorithms for Artificial Intelligence and Optimization in Networks Beyond 5G and Thereafter
总结无线通信 50 年的研究:5G 及以后网络中的人工智能和优化算法
- 批准号:
RGPIN-2022-04417 - 财政年份:2022
- 资助金额:
$ 35.09万 - 项目类别:
Discovery Grants Program - Individual
De novo development of small CRISPR-Cas proteins using artificial intelligence algorithms
使用人工智能算法从头开发小型 CRISPR-Cas 蛋白
- 批准号:
10544772 - 财政年份:2022
- 资助金额:
$ 35.09万 - 项目类别: