Semi-Automating Data Extraction for Systematic Reviews
用于系统评价的半自动数据提取
基本信息
- 批准号:9326367
- 负责人:
- 金额:$ 29.35万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2015
- 资助国家:美国
- 起止时间:2015-09-20 至 2019-08-31
- 项目状态:已结题
- 来源:
- 关键词:AgeAreaCaringCategoriesCharacteristicsClinicalClinical TrialsCollaborationsCommunity MedicineComplementComputer softwareComputing MethodologiesDataData ElementData SetDatabasesDecision MakingEffectiveness of InterventionsElementsEvidence Based MedicineEvidence based practiceExerciseFeedbackGoalsGrowthHealthcareHourHuman ResourcesInterdisciplinary StudyInterventionLettersLinkLiteratureMachine LearningManualsMedicalMedicineMethodologyMethodsModelingModernizationNational Health PolicyNatural Language ProcessingOnline SystemsOutcomePatient CarePerformancePersonsPopulation CharacteristicsPositioning AttributeProcessPublic HealthPublishingResearchResearch PersonnelResourcesSample SizeServicesSoftware ToolsStandardizationStructureSystemTextTrainingWorkWorkloadbaseclinical practicecomputerized toolscostcost efficientdata miningdesignevidence baseexperienceimprovedinnovationinterestlearning strategymembernovelopen sourceprocess optimizationstudy characteristicssystematic reviewtooltrial designusabilityweb servicesweb-based tool
项目摘要
DESCRIPTION (provided by applicant): Evidence-based medicine (EBM) looks to inform patient care with the totality of available relevant evidence. Systematic reviews are the cornerstone of EBM and are critical to modern healthcare, informing everything from national health policy to bedside decision-making. But conducting systematic reviews is extremely laborious (and hence expensive): producing a single review requires thousands of person-hours. Moreover, the exponential expansion of the biomedical literature base has imposed an unprecedented burden on reviewers, thus multiplying costs. Researchers can no longer keep up with the primary literature, and this hinders the practice of evidence-based care.
The long term aim of this work is to develop computational tools and methods that optimize the practice of EBM. The proposed work thus builds upon our previous successful efforts developing computational approaches that reduce the workload in EBM. More specifically, we aim to develop tools that semi-automate the laborious task of data extraction - identifying and extracting the information of interest (e.g., trial sample size, interventions and outcomes) from the free-texts of biomedical articles - via novel machine learning methods. Semi-automating this task will drastically reduce reviewer workload, thus enabling the practice of EBM in an age of information overload.
Previous efforts to automate data extraction from articles describing clinical trials have shown promise, but lack the accuracy and scope necessary for real-world use. These approaches have been impeded by the absence of a large corpus of annotated clinical trials, and by the difficulty of constructing models to automatically extract all of the variables necessary for synthesis. We describe methodological innovations to overcome these hurdles. First, to train our machine learning models we propose leveraging large existing databases that contain structured information about clinical trials, in lieu of the usual approach of collecting expensive manual annotations. Practically, this means we will be able to exploit a very large `pseudo-annotated' dataset that is an order of magnitude bigger than what has been used in previous efforts, thus substantially improving model performance. Our extensive preliminary work demonstrates the promise and feasibility of this approach. Second, we propose novel machine learning models appropriate for the tasks of article categorization and data extraction for EBM. These models will specifically be designed to perform extraction of multiple, correlated data elements of interest while simultaneously classifying articles into clinically salient categories useful for EBM.
We will rigorously evaluate the developed methods to assess their practical utility, specifically y comparing automated extraction accuracy to that achieved by trained systematic reviewers. And to make these methods useful to end-users (systematic reviewers), we will develop and evaluate open-source software and tools, including a web-based extraction tool that integrates our machine learning models to automatically extract information from uploaded articles (PDFs). We will conduct a user study to evaluate the utility and usability of this tool in practice.
描述(由申请人提供):循证医学(EBM)旨在通过所有可用的相关证据告知患者护理。系统评价是循证医学的基石,对现代医疗保健至关重要,为从国家卫生政策到床边决策的一切提供信息。但进行系统性评论极其费力(因此成本高昂):制作一篇评论需要数千个工时。此外,生物医学文献库的指数级扩张给审稿人带来了前所未有的负担,从而使成本成倍增加。研究人员不再能跟上主要文献,这阻碍了循证护理的实践。
这项工作的长期目标是开发优化循证医学实践的计算工具和方法。因此,建议的工作建立在我们以前成功的努力,开发计算方法,减少EBM的工作量。更具体地说,我们的目标是开发工具,使数据提取的繁重任务半自动化-识别和提取感兴趣的信息(例如,试验样本量,干预措施和结果)从生物医学文章的免费文本-通过新的机器学习方法。半自动化这项任务将大大减少审查工作量,从而使循证医学的实践在信息过载的时代。
以前从描述临床试验的文章中自动提取数据的努力已经显示出希望,但缺乏真实世界使用所需的准确性和范围。这些方法受到了缺乏大量注释临床试验语料库以及难以构建模型以自动提取合成所需的所有变量的阻碍。我们描述了克服这些障碍的方法创新。首先,为了训练我们的机器学习模型,我们建议利用包含有关临床试验的结构化信息的大型现有数据库,而不是收集昂贵的手动注释的通常方法。实际上,这意味着我们将能够利用一个非常大的“伪注释”数据集,它比以前的工作中使用的数据集大一个数量级,从而大大提高模型性能。我们广泛的初步工作证明了这种方法的前景和可行性。其次,我们提出了新的机器学习模型,适合于EBM的文章分类和数据提取的任务。这些模型将专门设计用于提取多个相关的感兴趣数据元素,同时将文章分类为对EBM有用的临床显著类别。
我们将严格评估所开发的方法,以评估其实际效用,特别是将自动提取准确性与训练有素的系统审查员所实现的准确性进行比较。为了使这些方法对最终用户(系统评审员)有用,我们将开发和评估开源软件和工具,包括基于Web的提取工具,该工具集成了我们的机器学习模型,可以从上传的文章(PDF)中自动提取信息。我们将进行一项用户研究,以评估该工具在实践中的实用性和可用性。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Randolph Bias其他文献
Randolph Bias的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Randolph Bias', 18)}}的其他基金
Semi-Automating Data Extraction for Systematic Reviews
用于系统评价的半自动数据提取
- 批准号:
9028559 - 财政年份:2015
- 资助金额:
$ 29.35万 - 项目类别:
相似国自然基金
层出镰刀菌氮代谢调控因子AreA 介导伏马菌素 FB1 生物合成的作用机理
- 批准号:2021JJ40433
- 批准年份:2021
- 资助金额:0.0 万元
- 项目类别:省市级项目
寄主诱导梢腐病菌AreA和CYP51基因沉默增强甘蔗抗病性机制解析
- 批准号:32001603
- 批准年份:2020
- 资助金额:24.0 万元
- 项目类别:青年科学基金项目
AREA国际经济模型的移植.改进和应用
- 批准号:18870435
- 批准年份:1988
- 资助金额:2.0 万元
- 项目类别:面上项目
相似海外基金
Onboarding Rural Area Mathematics and Physical Science Scholars
农村地区数学和物理科学学者的入职
- 批准号:
2322614 - 财政年份:2024
- 资助金额:
$ 29.35万 - 项目类别:
Standard Grant
Point-scanning confocal with area detector
点扫描共焦与区域检测器
- 批准号:
534092360 - 财政年份:2024
- 资助金额:
$ 29.35万 - 项目类别:
Major Research Instrumentation
TRACK-UK: Synthesized Census and Small Area Statistics for Transport and Energy
TRACK-UK:交通和能源综合人口普查和小区域统计
- 批准号:
ES/Z50290X/1 - 财政年份:2024
- 资助金额:
$ 29.35万 - 项目类别:
Research Grant
Wide-area low-cost sustainable ocean temperature and velocity structure extraction using distributed fibre optic sensing within legacy seafloor cables
使用传统海底电缆中的分布式光纤传感进行广域低成本可持续海洋温度和速度结构提取
- 批准号:
NE/Y003365/1 - 财政年份:2024
- 资助金额:
$ 29.35万 - 项目类别:
Research Grant
Collaborative Research: Scalable Manufacturing of Large-Area Thin Films of Metal-Organic Frameworks for Separations Applications
合作研究:用于分离应用的大面积金属有机框架薄膜的可扩展制造
- 批准号:
2326714 - 财政年份:2024
- 资助金额:
$ 29.35万 - 项目类别:
Standard Grant
Collaborative Research: Scalable Manufacturing of Large-Area Thin Films of Metal-Organic Frameworks for Separations Applications
合作研究:用于分离应用的大面积金属有机框架薄膜的可扩展制造
- 批准号:
2326713 - 财政年份:2024
- 资助金额:
$ 29.35万 - 项目类别:
Standard Grant
Unlicensed Low-Power Wide Area Networks for Location-based Services
用于基于位置的服务的免许可低功耗广域网
- 批准号:
24K20765 - 财政年份:2024
- 资助金额:
$ 29.35万 - 项目类别:
Grant-in-Aid for Early-Career Scientists
RAPID: Collaborative Research: Multifaceted Data Collection on the Aftermath of the March 26, 2024 Francis Scott Key Bridge Collapse in the DC-Maryland-Virginia Area
RAPID:协作研究:2024 年 3 月 26 日 DC-马里兰-弗吉尼亚地区 Francis Scott Key 大桥倒塌事故后果的多方面数据收集
- 批准号:
2427233 - 财政年份:2024
- 资助金额:
$ 29.35万 - 项目类别:
Standard Grant
RAPID: Collaborative Research: Multifaceted Data Collection on the Aftermath of the March 26, 2024 Francis Scott Key Bridge Collapse in the DC-Maryland-Virginia Area
RAPID:协作研究:2024 年 3 月 26 日 DC-马里兰-弗吉尼亚地区 Francis Scott Key 大桥倒塌事故后果的多方面数据收集
- 批准号:
2427232 - 财政年份:2024
- 资助金额:
$ 29.35万 - 项目类别:
Standard Grant
RAPID: Collaborative Research: Multifaceted Data Collection on the Aftermath of the March 26, 2024 Francis Scott Key Bridge Collapse in the DC-Maryland-Virginia Area
RAPID:协作研究:2024 年 3 月 26 日 DC-马里兰-弗吉尼亚地区 Francis Scott Key 大桥倒塌事故后果的多方面数据收集
- 批准号:
2427231 - 财政年份:2024
- 资助金额:
$ 29.35万 - 项目类别:
Standard Grant