A Knowledge Provider for Scruffy Sources of Metadata in Translational Medicine
转化医学元数据源的知识提供者
基本信息
- 批准号:10057243
- 负责人:
- 金额:$ 5.6万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2020
- 资助国家:美国
- 起止时间:2020-01-23 至 2020-04-07
- 项目状态:已结题
- 来源:
- 关键词:AddressAreaBig Data to KnowledgeBiomedical ComputingClinical TrialsCommunitiesComputer softwareComputersDataData CollectionData SetData SourcesEvaluationGoalsKnowledgeLaboratoriesLiteratureManualsMetadataMethodologyMethodsOntologyPatientsPeer ReviewPerformanceProcessProviderPublicationsPublishingRecordsResearchResourcesScientistSemanticsServicesSourceStandardizationSystemTechniquesTechnologyTestingTextUnited States National Institutes of HealthWorkbiomedical ontologydata dictionarydata standardsexperimental studyinterestinteroperabilityknowledge graphknowledge integrationonline repositorypatient populationprogramsprospectiverepositoryresponsesecondary analysisspellingstatisticstranslational medicine
项目摘要
An essential task for the Biomedical Data Translator is to identify scientific experiments that
have been performed or that are ongoing, and to enable integration of knowledge of the
experimental methods, the results, and—when available—the conclusions with other knowledge
sources. Such capabilities will enable queries such as: (1) Has anyone ever performed an
experiment using methods like these? (2) Has anyone performed a study where the data may
support a particular conclusion? (3) Are there any clinical trials for a particular condition whose
patient population is a good match for a patient whom I now need to treat? (4) What best
practices are suggested by the results of current clinical trials for a particular condition?
Sometimes such queries can be addressed through an analysis of the scientific literature. More
often, however, the published literature does not provide the methodological details needed to
address such questions—even if NLP techniques were good enough to find the answers.
Publications also provide only summary statistics of the experimental results. To address the
kinds of queries that are of most interest to the Translator, it is necessary to access the actual
experimental data online, starting with the metadata that are intended to provide descriptions of
the datasets and of the experiments that led to the collection of the data in the first place.
The problem for the Translator project is that the metadata that describe most online
experimental data sources are difficult for computers to find and to process. Our laboratory’s
analysis of the NCBI BioSample metadata repository, for example, shows that scientists largely
avoid using standard data dictionaries entirely, and—partly as a result—they are extremely
sloppy when they provide metadata values [3]. (A case in point: Some 76% of the metadata
values in BioSample that are intended to be Boolean are neither true nor false.) Despite all the
discussion in the past few years about making online datasets Findable, Accessible,
Interoperable, and Re-usable (FAIR) [14], most online datasets are not close to FAIR.
Our laboratory is developing technology that can rectify errors in online metadata. Like a
spell-checker for metadata, our approach will attempt to identify the intentions of metadata
authors, to correct typos, and to convert free-text strings to ontology terms whenever possible
[6]. Our goal is to provide a service that will transform the scruffy metadata that pervade online
descriptions of biomedical experiments into a form that will allow automated discovery,
integration, and secondary analysis of research results in ways that are simply not possible at
present. We anticipate that the Translator will call on our service to find experimental datasets
and their accompanying metadata, to perform standard analyses of such datasets, and to
integrate descriptions of experiments into the evolving knowledge graph.
We will evaluate the performance of our Knowledge Provider by studying its response to queries
from the Translator community and by peer review of a subset of the underlying, cleaned up
metadata records that it processes from actual online repositories, such as BioSample and
ClinicalTrials.gov. Our evaluation necessarily will be limited by the pragmatics of selecting a
manageable test set of metadata and by the inherent shortcomings of manual peer review.
Our laboratory has a sustained tradition of collaborating to develop major national resources
that bring semantic technology to biomedicine. Our BioPortal ontology repository [5] was
developed by the National Center for Biomedical Ontology (NCBO), one of the NIH National
Centers for Biomedical Computing. The CEDAR Workbench for the prospective authoring of
standardized metadata [11,12] was developed under the NIH Big Data to Knowledge (BD2K)
program. Our Protégé system for building and maintaining biomedical ontologies is the most
widely used software for creating semantic technology in the world [15]. Our group has ongoing
relationships with corporations such as Pinterest, BASF, and Elsevier to assist them in their
work to develop enterprise-wide knowledge graphs. We are thus well equipped to develop our
Knowledge Provider and to assist the consortium broadly in the area of semantic technology.
生物医学数据翻译器的一项重要任务是识别科学实验,
已经执行或正在进行的,并使知识的整合,
实验方法,结果,以及-如果可用-与其他知识的结论
源这样的能力将使得能够进行诸如以下的查询:(1)是否有人曾经执行过
用这样的方法做实验(2)有没有人做过一项研究,
支持一个特定的结论?(3)有没有针对特定情况的临床试验,
患者人群是否与我现在需要治疗的患者匹配?(4)哪些最佳
目前的临床试验结果表明,对特定条件的做法?
有时,这种疑问可以通过分析科学文献来解决。更
然而,出版的文献往往没有提供所需的方法细节,
解决这些问题-即使NLP技术足够好,找到答案。
出版物也只提供了实验结果的汇总统计。解决
对于翻译者最感兴趣的各种查询,有必要访问实际的
实验数据在线,从元数据开始,旨在提供描述
数据集和实验的数据,导致收集的数据摆在首位。
Translator项目的问题是,描述大多数在线内容的元数据
计算机很难找到和处理实验数据源。我们的实验室
例如,对NCBI BioSample元数据库的分析表明,科学家们在很大程度上
完全避免使用标准的数据字典,部分原因是它们非常
当他们提供元数据值时很草率[3]。(一个很好的例子:大约76%的元数据
BioSample中的布尔值既不是真也不是假。)尽管所有的
在过去的几年里,关于使在线数据集可查找,可解释,
互操作和可重用(FAIR)[14],大多数在线数据集都不接近FAIR。
我们的实验室正在开发可以纠正在线元数据中错误的技术。像一个
元数据的拼写检查器,我们的方法将尝试识别元数据的意图
作者,纠正错别字,并尽可能将自由文本字符串转换为本体术语
[6]的文件。我们的目标是提供一种服务,将改变充斥在线的肮脏元数据
将生物医学实验的描述转化为一种允许自动发现的形式,
整合和二次分析的研究结果的方式,根本不可能在
礼物我们预计翻译者将调用我们的服务来寻找实验数据集
及其附带的元数据,对这些数据集进行标准分析,
将实验的描述集成到不断发展的知识图中。
我们将通过研究知识提供者对查询的响应来评估其性能
从翻译社区和同行审查的一个子集的基础上,清理
元数据记录,它从实际的在线存储库处理,如生物样品和
我们的评估必然会受到选择一个
可管理的元数据测试集和人工同行评审的固有缺点。
我们的实验室有一个持续的传统,合作开发主要的国家资源
将语义技术引入生物医学。我们的BioPortal本体库[5]是
由美国国立卫生研究院国家生物医学本体中心(NCBO)开发
生物医学计算中心。CEDAR的未来作者
标准化元数据[11,12]是在NIH大数据到知识(BD2K)下开发的
程序.我们用于构建和维护生物医学本体的Protégé系统是世界上
在世界上广泛使用的用于创建语义技术的软件[15]。我们的团队正在
与Pinterest、巴斯夫和爱思唯尔等公司建立关系,以帮助他们
致力于开发企业范围的知识图谱。因此,我们有能力发展我们的
知识提供者,并在语义技术领域广泛协助该联盟。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Mark A Musen其他文献
Mark A Musen的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Mark A Musen', 18)}}的其他基金
Enhanced ontology engineering through a Web-based, Cloud-based software architecture
通过基于网络、云的软件架构增强本体工程
- 批准号:
10405968 - 财政年份:2021
- 资助金额:
$ 5.6万 - 项目类别:
The Metadata Powerwash - Integrated tools to make biomedical data FAIR
Metadata Powerwash - 使生物医学数据公平的集成工具
- 批准号:
10397981 - 财政年份:2021
- 资助金额:
$ 5.6万 - 项目类别:
Enhancing the RADx Data Hub for Data FAIRness
增强 RADx 数据中心以实现数据公平
- 批准号:
10433797 - 财政年份:2021
- 资助金额:
$ 5.6万 - 项目类别:
Enhancing the RADx Data Hub for Data FAIRness
增强 RADx 数据中心以实现数据公平
- 批准号:
10794704 - 财政年份:2021
- 资助金额:
$ 5.6万 - 项目类别:
Improved metadata authoring to enhance AI/ML readiness of associated datasets
改进元数据创作,以增强相关数据集的 AI/ML 准备情况
- 批准号:
10592638 - 财政年份:2021
- 资助金额:
$ 5.6万 - 项目类别:
The Metadata Powerwash - Integrated tools to make biomedical data FAIR
Metadata Powerwash - 使生物医学数据公平的集成工具
- 批准号:
10551273 - 财政年份:2021
- 资助金额:
$ 5.6万 - 项目类别:
BioPortal: An Expansive Knowledgebase of Biomedical Entities and Relations
BioPortal:生物医学实体和关系的广泛知识库
- 批准号:
10494104 - 财政年份:2021
- 资助金额:
$ 5.6万 - 项目类别:
BioPortal: An Expansive Knowledgebase of Biomedical Entities and Relations
BioPortal:生物医学实体和关系的广泛知识库
- 批准号:
10271048 - 财政年份:2021
- 资助金额:
$ 5.6万 - 项目类别:
Enhancing the RADx Data Hub for Data FAIRness
增强 RADx 数据中心以实现数据公平
- 批准号:
10699372 - 财政年份:2021
- 资助金额:
$ 5.6万 - 项目类别:
Enhancing the RADx Data Hub for Data FAIRness
增强 RADx 数据中心以实现数据公平
- 批准号:
10850055 - 财政年份:2021
- 资助金额:
$ 5.6万 - 项目类别:
相似国自然基金
层出镰刀菌氮代谢调控因子AreA 介导伏马菌素 FB1 生物合成的作用机理
- 批准号:2021JJ40433
- 批准年份:2021
- 资助金额:0.0 万元
- 项目类别:省市级项目
寄主诱导梢腐病菌AreA和CYP51基因沉默增强甘蔗抗病性机制解析
- 批准号:32001603
- 批准年份:2020
- 资助金额:24.0 万元
- 项目类别:青年科学基金项目
AREA国际经济模型的移植.改进和应用
- 批准号:18870435
- 批准年份:1988
- 资助金额:2.0 万元
- 项目类别:面上项目
相似海外基金
Onboarding Rural Area Mathematics and Physical Science Scholars
农村地区数学和物理科学学者的入职
- 批准号:
2322614 - 财政年份:2024
- 资助金额:
$ 5.6万 - 项目类别:
Standard Grant
TRACK-UK: Synthesized Census and Small Area Statistics for Transport and Energy
TRACK-UK:交通和能源综合人口普查和小区域统计
- 批准号:
ES/Z50290X/1 - 财政年份:2024
- 资助金额:
$ 5.6万 - 项目类别:
Research Grant
Wide-area low-cost sustainable ocean temperature and velocity structure extraction using distributed fibre optic sensing within legacy seafloor cables
使用传统海底电缆中的分布式光纤传感进行广域低成本可持续海洋温度和速度结构提取
- 批准号:
NE/Y003365/1 - 财政年份:2024
- 资助金额:
$ 5.6万 - 项目类别:
Research Grant
Point-scanning confocal with area detector
点扫描共焦与区域检测器
- 批准号:
534092360 - 财政年份:2024
- 资助金额:
$ 5.6万 - 项目类别:
Major Research Instrumentation
Collaborative Research: Scalable Manufacturing of Large-Area Thin Films of Metal-Organic Frameworks for Separations Applications
合作研究:用于分离应用的大面积金属有机框架薄膜的可扩展制造
- 批准号:
2326714 - 财政年份:2024
- 资助金额:
$ 5.6万 - 项目类别:
Standard Grant
Collaborative Research: Scalable Manufacturing of Large-Area Thin Films of Metal-Organic Frameworks for Separations Applications
合作研究:用于分离应用的大面积金属有机框架薄膜的可扩展制造
- 批准号:
2326713 - 财政年份:2024
- 资助金额:
$ 5.6万 - 项目类别:
Standard Grant
Unlicensed Low-Power Wide Area Networks for Location-based Services
用于基于位置的服务的免许可低功耗广域网
- 批准号:
24K20765 - 财政年份:2024
- 资助金额:
$ 5.6万 - 项目类别:
Grant-in-Aid for Early-Career Scientists
RAPID: Collaborative Research: Multifaceted Data Collection on the Aftermath of the March 26, 2024 Francis Scott Key Bridge Collapse in the DC-Maryland-Virginia Area
RAPID:协作研究:2024 年 3 月 26 日 DC-马里兰-弗吉尼亚地区 Francis Scott Key 大桥倒塌事故后果的多方面数据收集
- 批准号:
2427233 - 财政年份:2024
- 资助金额:
$ 5.6万 - 项目类别:
Standard Grant
Postdoctoral Fellowship: OPP-PRF: Tracking Long-Term Changes in Lake Area across the Arctic
博士后奖学金:OPP-PRF:追踪北极地区湖泊面积的长期变化
- 批准号:
2317873 - 财政年份:2024
- 资助金额:
$ 5.6万 - 项目类别:
Standard Grant
RAPID: Collaborative Research: Multifaceted Data Collection on the Aftermath of the March 26, 2024 Francis Scott Key Bridge Collapse in the DC-Maryland-Virginia Area
RAPID:协作研究:2024 年 3 月 26 日 DC-马里兰-弗吉尼亚地区 Francis Scott Key 大桥倒塌事故后果的多方面数据收集
- 批准号:
2427232 - 财政年份:2024
- 资助金额:
$ 5.6万 - 项目类别:
Standard Grant