权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

BRIDGE Center Standards Core

BRIDGE 中心标准核心

基本信息

批准号：
10473242
负责人：
Monica Cecilia Munoz-Torres
金额：
$ 139.95万
依托单位：
UNIVERSITY OF COLORADO DENVER
依托单位国家：
美国
项目类别：
财政年份：
2022
资助国家：
美国
起止时间：
2022-07-06 至 2026-04-30
项目状态：
未结题

来源：
https://reporter.nih.gov/project-details/10473242
关键词：
Address Adopted Adoption Anatomy Artificial Intelligence Awareness Back Behavioral Benchmarking Bridge to Artificial Intelligence Businesses Code Communities Consultations Consumption Data Data Collection Data Discovery Data Engineering Data Provenance Data Scientist Data Set Deposition Development Discipline Disease Documentation Ecosystem Elements Ensure Environment Equipment and supply inventories Evaluation FAIR principles Generations Genes Goals Human Knowledge Language Licensing Link Machine Learning Modality Modeling Modernization Molecular Morphologic artifacts Ontology Output Phenotype Protocols documentation Provider Quality Control Readiness Registries Reproducibility Research Research Personnel Resources Sea Semantics Services Source Specific qualifier value Specificity Standardization System Terminology Time Training Translational Research United States National Institutes of Health Update Variant Vocabulary Work dashboard data dissemination data ingestion data modeling data quality data reuse data standards empowered insight interoperability large datasets machine learning model novel open source programs quality assurance response skills tool web portal working group

项目摘要

BRIDGE Center Standards Core Project Summary AI offers great potential for the discovery of novel biomedical insights from linkages between disparate, cross-domain datasets. Unfortunately, traditional hypothesis-driven datasets tend to be narrowly focused on the targeted problem domain with little consideration to “AI-readiness”. To best enable the use of such datasets in data-driven and cross-domain discovery, they must be made Findable, Accessible, Interoperable, and Reusable (FAIR). Lack of FAIRness is particularly problematic for AI, which is data-hungry. To fully leverage the power of AI approaches, researchers need to find and reuse data to combine into larger datasets, and the data must be interoperable or harmonized to be combined meaningfully. Transforming pre-existing datasets into AI-ready data is challenging, requiring extensive linking and curation by human experts. This challenge is exacerbated when annotating and linking data across domains, where standards may be disparate in purpose and specificity. Finally, many datasets do not adhere to best practices in data transparency, including content attribution and conditions on distribution and reuse. These additional considerations of Traceability, Licensing, and Connectedness create an operationalized model for FAIR: FAIR-TLC. Overcoming the barriers to FAIR-TLC is key to translational science and AI-driven biomedical discovery. Our team has led standards development efforts in numerous large consortia, including the GA4GH, HL7, and N3C. Our standards for representing biomedical concepts have been widely adopted, including those for human phenotypes (e.g., HPO, GA4GH Phenopackets), diseases (NCIt, Mondo, ICD-11), genes (Gene Ontology), anatomy (Uberon), and molecular variation (GA4GH VRS). We have developed standards and tools to address data provenance (SEPIO), contributions (Contributor Attribution Model), licensing barriers (Data Use Ontology, Reusable Data Project), and connectivity (Linked data Model Language, LinkML). We will build on our previous work, collaborative skills, and technical knowledge to develop a framework to enable the harmonization of standards across biomedical domains. We will form working groups with representatives of the Data Generation Projects (DGPs) to document use cases and synthesize data standard requirements. We will provide protocols and training for specifying standards, and provide concierge services in support of all deliverables and activities. We will create a version-controlled Bridge2AI Standards Registry to inventory standards for use by the DGPs, specified in the modality-agnostic LinkML framework, discoverable through the interactive Standards Hub, and automatically exportable to technical artifacts through our Data Transformation Toolbox. We will build a Standards Evaluation Dashboard for assessment and discovery of standards in datasets from Bridge2AI Data Generation Projects. We will promote best practices in the transparent and responsible sharing of datasets and ML models through DUO, Datasheets, and Model Cards.

BRIDGE中心标准核心项目摘要人工智能提供了巨大的潜力，从不同的，跨域数据集。不幸的是，传统的假设驱动的数据集往往局限于目标问题域很少考虑“AI就绪性”。为了更好地使用这些数据集，在数据驱动和跨域发现中，必须使它们可查找、可兼容、可互操作，可重复使用（公平）。缺乏公平性对于数据饥渴的人工智能来说尤其成问题。为了充分利用由于人工智能方法的强大功能，研究人员需要找到并重用数据，以便将联合收割机组合成更大的数据集，必须是可互操作的或协调的，以便有意义地结合起来。将预先存在的数据集转换为人工智能就绪的数据具有挑战性，需要人类专家进行广泛的链接和管理。这一挑战当跨域注释和链接数据时，标准的目的可能不同，和特异性。最后，许多数据集不符合数据透明度方面的最佳实践，包括内容分配和再利用的归属和条件。可追溯性、许可证、和连通性为FAIR创建了一个可操作的模型：FAIR-TLC。克服FAIR-TLC的障碍是转化科学和AI驱动的生物医学发现的关键。我们该团队领导了许多大型联盟的标准开发工作，包括GA 4GH、HL 7和 N3C我们表示生物医学概念的标准已被广泛采用，包括人表型（例如，HPO、GA 4GH Phenopackets）、疾病（NCIt、Mondo、ICD-11）、基因（Gene 本体论）、解剖学（Uberon）和分子变异（GA 4GH VRS）。我们开发了标准和工具解决数据来源（SEPIO）、贡献（贡献者归因模型）、许可障碍（数据使用本体，可重用数据项目）和连接性（链接数据模型语言，LinkML）。我们将建立在我们以前的工作，协作技能和技术知识，以制定一个框架，使生物医学领域的标准协调一致。我们将成立工作组，数据生成项目（DGP）的代表记录用例并综合数据标准要求.我们将提供规范标准的协议和培训，并提供礼宾服务支持所有交付成果和活动。我们将创建一个版本控制的Bridge 2AI标准注册表，由DGP使用的库存标准，在与模态无关的LinkML框架中指定，通过交互式标准中心，并通过我们的数据自动导出到技术工件转换我们将建立一个标准评估仪表板，用于评估和发现 Bridge 2AI数据生成项目的数据集标准。我们将推广最佳做法，通过DUO、数据表和模型卡透明、负责地共享数据集和ML模型。