Guiding humans to create better labeled datasets for machine learning in biomedical research
指导人类为生物医学研究中的机器学习创建更好的标记数据集
基本信息
- 批准号:10646429
- 负责人:
- 金额:$ 39.97万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2021
- 资助国家:美国
- 起止时间:2021-09-01 至 2025-05-31
- 项目状态:未结题
- 来源:
- 关键词:Active LearningAddressAlgorithmsBayesian neural networkBiologicalBiomedical ResearchClassificationClinical InformaticsClinical ResearchClinical TrialsCodeCollaborationsCommunitiesComputer SystemsComputer softwareDataData ScientistData SetDatabasesDedicationsEnvironmentFaceFetal healthFundingGrowthHigh Performance ComputingHistologicHumanImageInstitutionK-Series Research Career ProgramsKnowledgeLabelLearningMachine LearningMaternal HealthMeasurementMethodologyMethodsNatural Language ProcessingPathologistPathologyPatternPerformancePerinatalPlacentaProcessRecording of previous eventsReproducibilityResearchResearch PersonnelResourcesSamplingScienceSiteSoftware FrameworkSoftware ToolsSourceStructureTissue imagingTrainingUnited States National Library of MedicineWorkalgorithm trainingbasecloud platformcohortcomputing resourcesdeep learningdeep learning algorithmdigital pathologyexperiencefeature extractionhands-on learninghuman-in-the-loopimprovedlarge datasetslearning strategymachine learning algorithmmachine learning modelmalignant breast neoplasmmultidimensional datanovel strategiesopen sourcepathology imagingpublic health relevancerepositorysimulationsoftware developmenttooltool developmentunsupervised learningwhole slide imaging
项目摘要
PROJECT SUMMARY / ABSTRACT
Machine learning (ML) has seen tremendous advances in the past decade, fueled by growth in computing and
the availability of large labeled datasets. While the impact of these advances on clinical and biomedical
research are potentially significant, these applications face unique challenges due to the difficulty in acquiring
labels from biomedical experts. Furthermore, ML algorithms often fail to generalize across institutions or
datasets due to measurement biases (e.g. MR scanners) or intrinsic demographic or biological differences
between cohorts / datasets which limits their impact in biomedical science. This proposal will develop new
methodology and open-source software that biomedical data scientists can use with their applications to 1.
Improve data labeling by identifying the best samples for labeling that provide the most benefit for training ML
algorithms; 2. Improve generalization of ML models across institutes; and 3. Perform this work on scalable
cloud platforms. We will first explore how to improve upon methods known as active learning that interactively
construct labeled datasets by having an algorithm select samples that address its weaknesses and present
these samples to an expert for labeling. We will then investigate how these samples can be selected to
improve the performance of ML algorithms across multiple institutions by learning robust patterns that are not
specific to any one site. Finally, we will develop an extendable software framework that developers can
integrate into their own applications to take advantage of these methods, and that can operate on cloud
platforms to support scalable analysis of large datasets. This work will be developed through a combination of
simulation studies using a unique repository of over 280,000 human markups of digital pathology images at
multiple institutions, and also user studies of the developed software frameworks focused on applications in
perinatal pathology and the human placenta. The software tools will impact a broad variety of biomedical
applications beyond pathology where data labeling and multi-institutional studies remain challenging.
项目总结/摘要
机器学习(ML)在过去十年中取得了巨大的进步,这得益于计算和
大型标记数据集的可用性。虽然这些进步对临床和生物医学的影响
研究具有潜在的重要性,这些应用程序面临着独特的挑战,由于难以获得
生物医学专家的标签。此外,ML算法通常无法跨机构进行推广,
由于测量偏差(例如MR扫描仪)或固有的人口统计学或生物学差异导致的数据集
这限制了它们在生物医学科学中的影响。该提案将开发新的
生物医学数据科学家可以将其应用程序用于1.
通过识别为训练ML提供最大益处的最佳标记样本来改进数据标记
算法; 2.提高ML模型在各机构间的泛化能力; 3.在可扩展的
云平台。我们将首先探讨如何改进被称为主动学习的方法,
通过让算法选择解决其弱点的样本来构建标记的数据集,
把这些样本交给专家做标签然后,我们将研究如何选择这些样本,
通过学习不需要的强大模式,提高ML算法在多个机构中的性能
具体到任何一个网站。最后,我们将开发一个可扩展的软件框架,
集成到自己的应用程序中,以利用这些方法,并可以在云上运行
平台,以支持大型数据集的可扩展分析。这项工作将通过以下方式进行:
模拟研究使用了超过280,000个数字病理学图像的人类标记的独特存储库,
多个机构,以及对开发的软件框架的用户研究,重点是
围产期病理学和人类胎盘。软件工具将影响各种生物医学
病理学以外的应用,其中数据标记和多机构研究仍然具有挑战性。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Lee Cooper其他文献
Lee Cooper的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Lee Cooper', 18)}}的其他基金
Brain Digital Slide Archive: An Open Source Platform for data sharing and analysis of digital neuropathology
Brain Digital Slide Archive:数字神经病理学数据共享和分析的开源平台
- 批准号:
10735564 - 财政年份:2023
- 资助金额:
$ 39.97万 - 项目类别:
Improved whole-brain spectroscopic MRI for radiation therapy planning
改进的全脑光谱 MRI 用于放射治疗计划
- 批准号:
10618320 - 财政年份:2022
- 资助金额:
$ 39.97万 - 项目类别:
Improved whole-brain spectroscopic MRI for radiation therapy planning
改进的全脑光谱 MRI 用于放射治疗计划
- 批准号:
10443355 - 财政年份:2022
- 资助金额:
$ 39.97万 - 项目类别:
Guiding humans to create better labeled datasets for machine learning in biomedical research
指导人类为生物医学研究中的机器学习创建更好的标记数据集
- 批准号:
10609284 - 财政年份:2021
- 资助金额:
$ 39.97万 - 项目类别:
Guiding humans to create better labeled datasets for machine learning in biomedical research
指导人类为生物医学研究中的机器学习创建更好的标记数据集
- 批准号:
10466914 - 财政年份:2021
- 资助金额:
$ 39.97万 - 项目类别:
Guiding humans to create better labeled datasets for machine learning in biomedical research
指导人类为生物医学研究中的机器学习创建更好的标记数据集
- 批准号:
10298684 - 财政年份:2021
- 资助金额:
$ 39.97万 - 项目类别:
Cloud strategies for improving cost, scalability, and accessibility of a machine learning system for pathology images
用于提高病理图像机器学习系统的成本、可扩展性和可访问性的云策略
- 批准号:
10824959 - 财政年份:2021
- 资助金额:
$ 39.97万 - 项目类别:
Informatics Tools for Quantitative Digital Pathology Profiling and Integrated Prognostic Modeling
用于定量数字病理学分析和综合预后建模的信息学工具
- 批准号:
10070213 - 财政年份:2018
- 资助金额:
$ 39.97万 - 项目类别:
Improved Whole-Brain Spectroscopic MRI for Radiation Treatment Planning
改进的全脑光谱 MRI 用于放射治疗计划
- 批准号:
9791190 - 财政年份:2018
- 资助金额:
$ 39.97万 - 项目类别:
Improved Whole-Brain Spectroscopic MRI for Radiation Treatment Planning
改进的全脑光谱 MRI 用于放射治疗计划
- 批准号:
9981743 - 财政年份:2018
- 资助金额:
$ 39.97万 - 项目类别:
相似海外基金
Rational design of rapidly translatable, highly antigenic and novel recombinant immunogens to address deficiencies of current snakebite treatments
合理设计可快速翻译、高抗原性和新型重组免疫原,以解决当前蛇咬伤治疗的缺陷
- 批准号:
MR/S03398X/2 - 财政年份:2024
- 资助金额:
$ 39.97万 - 项目类别:
Fellowship
Re-thinking drug nanocrystals as highly loaded vectors to address key unmet therapeutic challenges
重新思考药物纳米晶体作为高负载载体以解决关键的未满足的治疗挑战
- 批准号:
EP/Y001486/1 - 财政年份:2024
- 资助金额:
$ 39.97万 - 项目类别:
Research Grant
CAREER: FEAST (Food Ecosystems And circularity for Sustainable Transformation) framework to address Hidden Hunger
职业:FEAST(食品生态系统和可持续转型循环)框架解决隐性饥饿
- 批准号:
2338423 - 财政年份:2024
- 资助金额:
$ 39.97万 - 项目类别:
Continuing Grant
Metrology to address ion suppression in multimodal mass spectrometry imaging with application in oncology
计量学解决多模态质谱成像中的离子抑制问题及其在肿瘤学中的应用
- 批准号:
MR/X03657X/1 - 财政年份:2024
- 资助金额:
$ 39.97万 - 项目类别:
Fellowship
CRII: SHF: A Novel Address Translation Architecture for Virtualized Clouds
CRII:SHF:一种用于虚拟化云的新型地址转换架构
- 批准号:
2348066 - 财政年份:2024
- 资助金额:
$ 39.97万 - 项目类别:
Standard Grant
BIORETS: Convergence Research Experiences for Teachers in Synthetic and Systems Biology to Address Challenges in Food, Health, Energy, and Environment
BIORETS:合成和系统生物学教师的融合研究经验,以应对食品、健康、能源和环境方面的挑战
- 批准号:
2341402 - 财政年份:2024
- 资助金额:
$ 39.97万 - 项目类别:
Standard Grant
The Abundance Project: Enhancing Cultural & Green Inclusion in Social Prescribing in Southwest London to Address Ethnic Inequalities in Mental Health
丰富项目:增强文化
- 批准号:
AH/Z505481/1 - 财政年份:2024
- 资助金额:
$ 39.97万 - 项目类别:
Research Grant
ERAMET - Ecosystem for rapid adoption of modelling and simulation METhods to address regulatory needs in the development of orphan and paediatric medicines
ERAMET - 快速采用建模和模拟方法的生态系统,以满足孤儿药和儿科药物开发中的监管需求
- 批准号:
10107647 - 财政年份:2024
- 资助金额:
$ 39.97万 - 项目类别:
EU-Funded
Ecosystem for rapid adoption of modelling and simulation METhods to address regulatory needs in the development of orphan and paediatric medicines
快速采用建模和模拟方法的生态系统,以满足孤儿药和儿科药物开发中的监管需求
- 批准号:
10106221 - 财政年份:2024
- 资助金额:
$ 39.97万 - 项目类别:
EU-Funded
Recite: Building Research by Communities to Address Inequities through Expression
背诵:社区开展研究,通过表达解决不平等问题
- 批准号:
AH/Z505341/1 - 财政年份:2024
- 资助金额:
$ 39.97万 - 项目类别:
Research Grant